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Message  from  the  Chairs 

Welcome  to  the  Eighth  IEEE  Statistical  Signal  and  Array  Processing  (SSAP)  Workshop.  Corfu,  in  Greek, 
means  apex  and  thus  sounds  most  appropriate  as  a  choice  to  host  the  biannual  SSAP'96  summit  of  the 
Signal  Processing  Society.  We  look  forward  to  an  exciting  and  memorable  meeting.  The  workshop  venue 
is  the  Corfu  Hilton  in  the  heart  of  Corfu,  featuring  beaches  amid  cliffs  and  pines,  and  the  atmosphere  to 
promote  the  exchange  of  technical  ideas  while  enjoying  the  Greco-Ionean  ambiance. 

Statistical  signal  and  array  processing  continues  to  be  the  backbone  of  many  real-world  engineering 
applications,  and  consistent  with  previous  meetings,  we  expect  SSAP'96  to  continue  the  tradition  of 
excellence  in  the  teclmical  quality  of  presentations  on  state-of-the-art  research.  The  international  character 
of  the  workshop  keeps  growing,  and  this  year's  meeting,  being  the  first  one  to  move  away  from  North 
America,  is  well  attended  by  European  participants.  As  with  previous  SSAP  meetings,  we  have 
introduced  some  changes  in  the  organization  and  the  emphasis  of  the  meeting.  Correspondence  with 
authors  was  primarily  via  e-mail,  and  for  publicity  and  notifications  we  relied  heavily  on  our  regularly 
updated  home  page  (http://watt.seas.virginia.edu/~spirit/ssap96/).  Thanks  to  external  support,  we 
were  able  to  offer  bargain  basement  registration  fees  ($550  for  regular  and  $450  for  student  attendees). 

We  received  270  summaries  from  45  coimtries-  a  record  number  of  submissions  for  SSAP.  Each 
submission  was  scored  by  three  reviewers,  and  in  order  to  maintain  the  workshop's  atmosphere  we 
accepted  only  139  papers  which  we  expect  to  be  of  high  quality.  Our  apologies  to  authors  whose  fine 
submissions  we  could  not  accommodate,  and  our  sincere  thanks  to  reviewer  experts,  mostly  drawn  from 
the  SSAP  Technical  Committee,  for  their  help  with  excellent  and  timely  reviews.  Signal  Processing  for 
Communications  and  Array  Signal  Processing  were  well  represented  in  the  number  of  submissions  (and 
thus  ill  the  number  of  accepted  papers).  Applications,  detection-estimation,  non-Gaussian,  non-stationary, 
and  nonlinear  formed  other  well-defined  clusters,  and  all  are  represented  in  the  ten  poster  sessions  and 
five  outstanding  plenary  talks.  The  center  of  focus  for  this  year's  research  theme  is  SSAP  for 
Communications. 

Our  warm  thanks  go  to  the  volunteers  of  the  international  program  committee,  the  European  and 
Austral-Asian  liaisons,  the  publication,  publicity,  and  local  arrangement  chairs.  The  informative  and 
creative  home  page  prepared  by  Guotong  Zhou  contributed  significantly  to  the  workshop  (its  format  is 
now  being  used  as  a  template  by  other  workshops).  Maria  Rangoussi's  efforts  in  Greece  are  also  greatly 
appreciated  (she  bridged  the  transatlantic  distance  with  the  organizers  in  the  US).  We  finally  wish  to 
acknowledge  support  from  the  U.S.  Army  Research  Office,  the  U.S.  Office  of  Naval  Research,  the  Greek 
General  Secretariat  for  Research  and  Technology,  and  the  Greek  companies  Intracom  and  Alpha. 

We  hope  that  your  stay  in  Corfu  will  not  only  be  technically  enriching  but  will  also  give  you  the 
opportimity  to  meet  new  fellow  researchers,  renew  old  acquaintances,  and  to  enjoy  the  Greek  sea  and 
sun.  We  look  forward  to  meeting  you  in  Corfu. 

Georgios  B.  Giannakis  and  Ananthram  Swami 
Co-organizers  and  Co-Chairs 
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Abstract 

A  simple,  flexible,  and  robust  procedure  to  detect  regu¬ 
larity  in  point  processes  versus  the  alternative  of  random¬ 
ness  (i.e.,  a  poisson  point  process)  is  the  empty  boxes  test 
(EBT).  The  EBT  can  be  extended  to  a  multivariate  statis¬ 
tical  test  in  several  ways  including  an  implementation  of 
a  skeptical  likelihood  test  (SET).  These  approaches  have 
previously  been  used  to  detect  the  regularity  of  minefields, 
a  two-dimensional  point  process,  where  the  alternative  is 
termed  complete  spatial  randomness  ( CSR).  In  this  paper, 
these  methods  are  applied  to  the  problem  of  detecting  reg¬ 
ularity  in  chaotic  signals  such  as  pseudo-random  number 
generators. 


1.  INTRODUCTION 

Detecting  minefields  in  the  presence  of  clutter  is  an  im¬ 
portant  challenge  for  the  Navy.  Minefields  have  point  pat¬ 
terns  that  tend  to  exhibit  regularity  such  as  equal-spacing 
and  collinearity  that  provide  potentially  valuable  discrim¬ 
inants  against  natural  occuring  clutter  which  tends  to  ex¬ 
hibit  complete  spatial  randomness  (CSR).  These  tendencies 
arise  because  of  a  variety  of  compelling  factors  including 
strategic  doctrine,  safety,  tactical  and  economic  efficiency, 
and  perhaps  most  intriguing  the  human  element.  In  [4]  and 
[5],  several  simple  procedures  were  introduced  to  detect 
regularity  in  minefields  and  other  point  processes  gener¬ 
ated  by  humans  (e.g.,  lottery  numbers).  Figure  1  shows 
an  example  of  a  minefield  that  is  not  so  apparent  with  the 
addition  of  clutter  points. 

Another  important  problem  where  regularity  is  being  de¬ 
tected  as  an  alternative  to  randomness  is  the  identification 
of  chaotic  signals.  Chaos  theory  is  being  used  to  develop 
low  probability  of  intercept  (LPI)  and  spread  spectrum 
communication  signals  where  traditional  detection  meth- 


Figure  1.  Examples  of  a  minefield  with  50  mines 
and  50  additional  random  clutter  points 


ods  would  fail.  In  these  cases,  the  EBT  and  its  variants 
are  alternative  approaches  to  detection  worth  considering. 
A  particular  interesting  example  to  illustrate  this  claim  is 
a  pseudo-random  number  generator  (which  is  actually  a 
deterministic,  chaotic  process)  with  a  white  spectrum. 

2.  TESTS  TO  DETECT  REGULARITY 

A  variety  of  methods  to  detect  deviations  from  CSR  in 
point  patterns  have  been  developed  for  the  most  part  on 
the  alternative  of  tendency  towards  clustering  rather  than 
the  tendency  towards  regularity.  Cressie  provides  a  com¬ 
prehensive  overview  of  these  and  other  techniques  with  a 
demonstration  on  the  longleaf  pine  data  set  [2].  Some  al¬ 
ternative  approaches  are  introduced  below. 

2.1.  Empty  boxes  test 

Consider  a  CSR  process  with  n  points  on  a  set  A  in 
that  has  been  partitioned  into  N  reqions  of  equal  area 
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to  be  referred  to  informally  as  boxes.  A  variety  of  tests 
to  detect  regularity  can  be  based  on  Mq,  Mi,  . . . ,  M„  and 
I'l,  1-2,  •  •  •  >  Yn  where  Mr  and  Yi  are  random  variables  de¬ 
noting  respectively  the  number  of  boxes  containing  exactly 
r  points  and  the  number  of  points  in  box  i. 

A  simple  test  to  test  regularity  is  based  on  Mq,  that  is, 
the  number  of  empty  boxes.  The  so-called  empty  boxes  test 
(EBT)  based  on  Mo  has  been  around  for  at  least  forty  years 
[3],  but  has  traditionally  been  used  to  detect  the  presence 
of  too  many  empty  boxes  as  an  indication  of  lack  of  fit.  In 
this  context,  regular  point  processes  (and  humans)  tend  to 
overfit  and  clustered  point  processes  tend  to  not  fit  well. 
A  disadvantage  of  the  EBT  for  minefield  detection  is  that 
there  is  no  explicit  modelling  of  collinearity  and  regular 
spacing,  per  se;  the  EBT  is  a  generic  regularity  detector. 
However,  the  advantages  of  the  EBT  include  its  flexibility, 
lack  of  edge  effects  and  its  robustness. 

Another  advantage  of  the  EBT  is  that  the  null  moments 
of  the  test  statistic  can  be  calculated  exactly  without  in¬ 
dependence  assumptions  on  Yi,Y2,  ■  ■  ■  ,Yff.  The  expected 
value  and  variance  of  Mo  under  CSR  is  given  by 

po  =  E[Mo]  =  Npo  (1) 

(Tq  =  V  ar[M(^  =  MO  +  N  {N  —  l)poo  ~  Mo 

where 

po  =  Pr{ii  =  o}  =  (1-;^)”  (3) 

POO  =  Pr{li  =  0,  y,-  =  0}  =  (1  -  (4) 

and  i  ^  j  in  (4). 

2.2.  Generalizing  the  empty  boxes  test 

The  empty  boxes  test  can  be  generalized  by  using 
Ml,  i\/2,  •  • . ,  in  addition  to  Mo  to  form  test  statistics.  For 
general  r  and  s  the  moments  analagous  to  (1)  and  (2)  are 
given  by 

fjLr  =  Npr  (5) 

cr2  =  Pr  +  N{N  -I)prr-Pl  (6) 

(Trs  ~  AT  (A  l-)Prs  PrPs 


Let  M  =  Mfc  be  the  multivariate  statistic  vector 
(Mo,  Ml, . . . ,  Mfe)^  with  mean  m  and  covariance  E.  Un¬ 
der  appropriate  mild  conditions,the  quadratic  form 

Q  =  Qfe=  (M-m^S'^CM-m)  (10) 

is  approximately  with  k  +  1  degrees  of  freedom  under 
CSR.  By  considering  both  the  sign  of  Mo  —  mo  with  the 
strictly  nonnegative  Qt  to  form  the  real- valued  statistic 

D  —  Dk~  sign{Mo  —  po)Qk  (U) 

a  one-sided  test  can  be  constructed  .  Positive  values  of  D 

indicate  clustering  and  negative  values  indicate  regularity. 
Tests  based  on  Do  are  equivalent  to  the  EBT.  Moreover, 
Qi  is  approximately  exponential  so  that  the  test  statistic 
£>1  is  approximately  double  exponential.  A  one-sided  test 
for  regularity  can  be  constructed  using  the  approximation 

Pr{Z?i  < -d}=  (12) 

where  d  >  0. 

2.3.  Skeptical  likelihood  test 

It  can  be  shown  that  the  most  likely  configuration  under 
CSR  would  reject  CSR  under  the  empty  boxes  test.  The 
reason  for  this  apparent  paradox  is  that  the  test  is  rejecting 
observations  that  are  too  likely  under  the  null  hypothesis 
suggesting  some  skepticism  is  in  order.  Generally,  even  dis¬ 
tributions  of  the  points  among  the  regions  are  more  likely 
than  uneven  distributions.  Without  specifying  an  alterna¬ 
tive,  a  skeptical  likelihood  test  (SLT)  for  a  statistic  T  with 
null  distribution  /  is  to  reject  Hq  for  high  values  of  /(T). 

A  skeptical  likelihood  test  for  minefield  detection  can 
be  based  on  the  test  statistic 

n 

T  =  ^Mr  logrl  (13) 

r=2 

where  significantly  small  values  of  T  indicate  regularity. 
The  mean  and  variances  of  (13)  can  be  calculated  directly 
using  (5), (6).  and  (7).  In  practice,  the  summation  in  (13) 
can  be  truncated  to  simplify  the  computation, 

2.4.  Detection  Performance  Results 


where  ars  =  cov[Mrj  Mg]  forr^s  and 


Pr  = 

Prs  ~ 


rJ^N'  ^  n’ 


fn  \  i  n  --  r 


(i_)r+s(i  _  l.)n- 
s  rA’  ^  N' 


(8) 

(9) 


as  in  (3)  and  (4). 


To  demonstrate  the  EBT  methods  on  the  clutter  example 
(n  ==  100)  in  Figure  1  a  value  of  TV  =  100  was  selected 
and  the  80x720  region  was  divided  into  a  5x20  grid  of 
rectangles  of  equal  size  (16x36).  One  could  think  of  this 
example  as  having  a  SNR  of  0  dB.  The  statistics  for  this 
partition  are  Mo  =  30,  Mi  =  45,  M2  =  20,  and  M3  =  5 
leading  to  P- values  are  of  .017  ,  .045  ,  and  .015  respectively 
for  the  EBT,  Di,  and  SLT. 
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1 


Empirical  ROC  Curve  with  50  Clutter  Points 


Uniform  Pseudo-Random  Generator 


Figure  2.  Empirical  ROC  curve  for  a  point  pro¬ 
cess  with  50  mines  and  50  additional  random 
clutter  points 


Figure  3.  Example  of  a  Pseudo-Random  Pro¬ 
cess 


In  order  to  get  a  better  understanding  of  the  relative  per¬ 
formance  of  these  three  methods,  100  realizations  of  the  50 
random  clutter  points  were  simulated.  Figure  2  shows  the 
resulting  empirical  RCX:  curves,  indicating  that  the  three 
methods  are  fairly  similar.  For  example,  with  a  false  alarm 
rate  of  .  1  is  approximately  .  the  probability  of  detection 
is  approximately  .8.  This  performance  is  impressive  con¬ 
sidering  that  the  patterns  are  not  always  visually  obvious 
and  these  methods  have  no  explicit  modeling  for  collinear- 
ity  and  except,  perhaps,  for  the  selected  dimensions  of  the 
regions  no  modelling  of  equal-spacing. 

3.  Examples  of  Chaotic  Signals 

Characterizing  the  difference  between  randomness  and 
chaos  is  a  fundamental  question  that  is  perhaps  more  philo¬ 
sophical  in  nature  than  mathematical,  statistical,  or  physi¬ 
cal.  As  is  discussed  recently  in  [1],  a  striking  example  to 
illustrate  the  fuzzy  boundary  of  these  concepts  is  pseudo¬ 
random  number  generators. 

3.1.  Pseudo-Random  Number  Generators 

Uniform  random  variates  Ui,  U2,  U^, . . .  can  be  gener¬ 
ated  by  multiplicative  congruential  methods  of  the  form 

Uk+i  =  aUkmodT  (14) 


along  with  some  initial  integer  "seed”  value  Uq  (for  exam¬ 
ple,  see  [8]  pages  377-388).  This  method  will  necessarily 
repeat,  but  the  constants  a  and  T  can  be  selected  in  such 
a  manner  to  give  a  period  on  the  order  of  T  and  a  white 
spectrum. 

The  EBT  will  be  demonstrated  on  an  example  with 
n  =  256  samples  from  the  chaotic  process  with  param¬ 
eters  a  =  31623  and  T  =  2^®  -  1  =  65535.  The  time 
series  realization  (Uq  =  14349)  normalized  to  give  uni¬ 
form  deviates  on  the  unit  interval  along  with  its  spectrum 
is  displayed  in  Figure  3  along  with  its  spectrum.  With 
N  =  256  equally  spaced  intervals,  there  are  Mq  =  84 
empty  boxes  which  gives  a  statistically  significant  z-value 
of  —2.0024  (P  =  .0226).  The  other  statistics  have  val¬ 
ues  of  Ml  =  106,  M2  =  49,  Ms  =  16,  and  M4  =  1 
which  leads  to  less  significant  results  of  jDi  =  -4.4754 
(P  =  .0673)  and  T  =  -65.8104  (P  =  .0881)  but  still 
provide  some  evidence  that  the  sequence  is  not  random. 
Another  pseudo-random  generator  that  has  been  discussed 
extensively  in  [1]  and  [6]  has  the  parameters  a  —  16807 
and  T  =  2®i  -  1  =  2147483647. 

3.2.  Kakutani-von  Neuman  Map 

In  this  section  we  look  at  a  minefield  generated  by  a 
variant  of  the  Kakutani  /  von  Neuman  map  shown  in  Fig¬ 
ure  4  which  we  will  denote  by  the  function  K.  The  map  K 
is  an  invertible,  measure  preserving  map  of  the  unit  inter- 
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Figure  4.  Kakutani  /  von  Neuman  Map 


val  with  a  derivative  of  I  (almost  everywhere  with  respect 
to  Lebesgue  measure)  that  is  weak  mixing  but  not  strong 
mixing  (see  [7]  for  details). 

The  {x,y)  locations  for  the  points  in  Figure  5  were 
generated  by 

Xk+l  —  I^{^k)  y/c+l  =  -f  (2/A:  +  ^fe)^/l28)  (15) 

The  unit  square  was  partitioned  into  N=400  sections  to  use 
the  EBT.  The  Mo  =108  empty  boxes  are  significantly 
less  than  expected  under  CSR  (z-value=-5).  In  this  case, 
the  CSR  hypothesis  is  rejected  for  a  tendency  to  cluster. 
However,  there  are  clearly  regularities  and  periodicities  of 
this  ’’minefield”  that  could  be  exploited  as  well. 

4.  Conclusions 

The  empty  boxes  test  and  its  extensions  offer  simple, 
flexible,  and  robust  approaches  to  detecting  regularity  in 
point  processes.  These  methods  are  particularly  applicable 
to  the  problem  of  characterizing  the  difference  between 
random  and  chaotic  processes  as  was  demonstrated  on  some 
nontrivial  examples. 

5.  Acknowledgements 


Figure  5.  Chaotic  minefield  with  500  Points  us¬ 
ing  a  perturbation  of  the  Kakutani  /  von  Neu¬ 
man  Map. 
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Abstract 

Min-max  simultaneous  signal  detection  and  param¬ 
eter  estimation  requires  the  solution  to  a  nonlinear  op¬ 
timization  problem.  Under  certain  conditions,  the  so¬ 
lution  can  he  obtained  by  equalizing  the  probabilities  of 
correctly  estimating  the  signal  parameter  over  the  pa¬ 
rameter  range.  We  present  an  iterative  algorithm  based 
on  Newton's  root  finding  method  to  solve  the  nonlinear 
min-max  optimization  problem  through  explicit  use  of 
the  equalization  criterion.  The  proposed  iterative  algo¬ 
rithm  does  not  require  prior  proof  of  whether  an  equal¬ 
izer  rule  exists:  convergence  of  the  algorithm  implies 
existence.  A  theoretical  study  of  algorithm  convergence 
is  followed  by  an  amplitude  estimation  example  which 
shows  that  decoupling  detection  from  estimation  entails 
a  very  significant  loss  in  estimation  performance  even 
when  optimal  decoupled  decision  rules  rules  are  imple¬ 
mented. 


1.  Introduction 

In  practical  applications,  one  frequently  needs  to 
design  a  signal  detector  or  a  signal  parameter  esti¬ 
mator  without  complete  knowledge  of  the  signal  or 
noise  model.  Several  approaches  to  detector  and  es¬ 
timator  design  exist  in  the  case  of  incompletely  char¬ 
acterized  models.  Among  these  are  invariance  meth¬ 
ods,  Bayesian  methods  which  use  non-informative  pri¬ 
ors,  and  min-max  methods.  Min-max  methods  form  an 
important  solution  category  because  they  ensure  op¬ 
timal  detector  or  estimator  performance  under  worst 
case  conditions.  Furthermore,  min-max  solutions  give 
rise  to  tight  performance  bounds  which  can  be  used 
to  benchmark  sub-optimal  or  ad  hoc  algorithms.  Min- 
max  methods  have  been  applied  to  problems  of  adap¬ 
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tive  array  processing,  harmonic  retrieval,  CFAR  detec¬ 
tion,  and  distributed  detection. 

Signal  detection  and  signal  parameter  estimation  are 
typically  considered  as  separate  problems.  In  other 
words,  signal  parameter  estimation  methods  assume 
that  there  is  no  uncertainty  about  signal  presence. 
However,  there  are  many  applications  where  signal  pa¬ 
rameter  estimation  has  to  be  done  under  signal  pres¬ 
ence  uncertainty,  such  as  fault  detection  and  estima¬ 
tion  in  dynamical  system  control  and  antenna  array 
processing.  Such  problems  are  refered  to  as  simulta¬ 
neous  detection  and  estimation  problems.  A  min-max 
solution  to  simultaneous  detection  and  estimation  was 
recently  given  in  [2].  The  problem  considered  in  [2]  is 
estimation  of  a  discrete  parameter  under  a  false  alarm 
constraint.  The  statistical  decision  procedure  which 
solves  the  problem  is  called  the  constrained  min-max 
classifier.  The  constrained  min-max  classifier  is  charac¬ 
terized  by  a  set  of  optimal  weights.  In  Bayesian  termi¬ 
nology,  the  optimal  weights  represent  a  least  favorable 
distribution  on  the  unknown  parameter  values.  Nu¬ 
merical  solutions  to  min-max  detection  or  estimation 
problems  involve  nonlinear  optimization  to  obtain  the 
least  favorable  distribution  [3,  1].  On  the  other  hand, 
under  certain  assumptions,  it  is  possible  to  formulate  a 
min-max  solution  by  making  explicit  use  of  a  simplify¬ 
ing  suflScient  condition  for  min-max  optimality.  In  the 
case  of  the  constrained  min-max  classifier,  this  sufli- 
cient  condition  is  the  equalization  of  the  correct  classi¬ 
fication  probabilities.  The  purpose  of  the  present  work 
is  to  present  an  iterative  algorithm  for  efficiently  com¬ 
puting  the  constrained  min-max  classifier  through  the 
equalization  condition.  An  important  attribute  of  the 
proposed  iterative  algorithm  is  that  it  does  not  require 
prior  proof  of  existence  of  an  equalizer  rule.  Conver¬ 
gence  of  the  algorithm  proves  existence,  i.e.  if  we  ob¬ 
serve  convergence,  then  the  associated  solution  is  the 
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constrained  min-max  classifier. 

The  correct  classification  probability  of  the  con¬ 
strained  min-max  classifier  provides  a  tight  lower 
bound  on  the  correct  classification  probability  of  any 
similarly  constrained  detection  and  classification  pro¬ 
cedure.  By  using  the  proposed  algorithm,  we  can  com¬ 
pute  both  this  lower  bound  and  the  classification  per¬ 
formance  of  sub-optimal  simultaneous  detection  and 
classification  procedures.  Comparison  of  the  perfor¬ 
mance  of  sub-optimal  procedures  with  the  lower  bound 
allows  us  to  assess  the  performance  loss  incurred  by 
employing  a  sub-optimal  approach  to  simultaneous  de¬ 
tection  and  classification. 

2.  Problem  Formulation 

Consider  the  indexed  probability  space 
where  //  is  a  parameter  that  lies  in  a  finite  discrete 
parameter  space  E,  o'  is  a  sigma  algebra  over  H  and 
is  a  probability  measure  defined  on  a.  Let  X  be 
a  random  variable  taking  values  in  a  sample  space  Q. 
Assume  that  X  has  a  probability  density  function  /^(x) 
with  respect  to  a  given  measure.  We  will  illustrate 
our  approach  for  the  case  of  a  location  parameter,  i.e. 
ffxix)  =  f{x  -  fi)  for  some  fixed  probability  density 
function  /.  Applications  of  the  location  parameter  case 
include  modeling  of  a  signal  of  unknown  amplitude  fx 
in  additive  noise  whose  probability  density  function  is 
given  by  /. 

Define  the  hypotheses  ifo, •  •  •  >  Hn  by: 


i.e.  if  the  maximum  weighted  likelihood  ratio  exceeds 
the  threshold  7,  then  decide  where  imax  = 

argmaxi>o  {ciffXi{x)/ffj,o{x)};  otherwise  decide  Hq. 
The  weights  Ci,...,Civ  are  computed  as  the  solution 
to  a  nonlinear  optimization  problem: 

n 

min  c^Pu..  (decide  Hi)  .  (4) 

Cl,,..,C„ 

The  threshold  7  is  determined  using  the  specified 
bound  a.  Solution  of  the  nonlinear  optimization  prob¬ 
lem  (4)  could  be  computationally  expensive.  We  will 
outline  an  alternative  solution  scheme  which  charac¬ 
terizes  the  min-max  optimal  classifier  by  means  of  a 
sufficient  condition. 

Suppose  that  the  parameterized  density  f^{x)  — 
f{x  —  fi)  has  infinite  support  {f{x)  >  0  for  all  x)  and 
has  a  monotone  likelihood  ratio.  The  infinite  support 
assumption  is  made  to  simplify  the  discussion  of  algo¬ 
rithm  convergence.  Infinite  support  is  not  absolutely 
necessary  for  the  algorithm  to  work.  An  important 
class  of  probability  densities  that  satisfies  the  mono¬ 
tone  likelihood  property  is  the  single  parameter  expo¬ 
nential  family.  Furthermore,  a  sufficient  condition  for 
f{x  —  fx)  to  have  a  monotone  likelihood  ratio  is  for  the 
function  -  log  f{x)  to  be  convex  in  x  [4,  page  509].  The 
normal,  the  double  exponential  and  the  logistic  distri¬ 
butions  all  satisfy  the  convexity  condition.  Under  the 
monotone  likelihood  ratio  assumption,  it  can  be  shown 
that  the  constrained  min-max  classifier  (3)  gives  rise  to 
the  following  decision  regions  Rq.Ri,.  ^  ■  .Rn' 


Hi:  ffj,^{x)  =  f{x  -  fXi)  ,  i  =  ...,n  (1) 

Let  Ro,Ri,...,Rn  be  the  decision  regions  for  hy¬ 
potheses  Ho,  Hi,...,  Hn,  respectively,  i.e.  the  classifier 
declares  fx  —  fXi  if  and  only  if  x  e  Ri,  i  =  0,1, ...  ,n. 
The  probability  of  a  correct  decision  under  hypothesis 
Hi,  i  =  0,1,..., n  is  given  by 

(decide  Hi)  =  Ff,,{X£Ri)  (2) 


We  will  be  interested  in  choosing  the  decision  re¬ 
gions  Ro,Ri,...,Rn  such  that  the  worst  case  correct 
classification  probability  min^  P^. (decide  Hi)  is  maxi¬ 
mized  subject  to  a  given  upper  bound  a  G  (0, 1]  on  the 
false  alarm  probability  1  —  P^^to (decide  Ho).  A  decision 
rule  which  maximizes  the  worst  case  correct  classifica¬ 
tion  probability  under  a  false  alarm  constraint  is  called 
a  constrained  min-max  classifier.  In  [2]  it  was  shown 
that  the  constrained  min-max  classifier  is  a  weighted 
likelihood  ratio  test: 


max 

i>0 


Uijx) 

Uoi^) 


} 


Hi. 


> 

< 


Ho 


7  , 


(3) 


Ho  (-oo,xo]; 

Ri  —  (^Xi—i , xi\ ,  i  —  1,  ...  ,71  1,  (5) 

Hn  =  (Xn-1,00) 

The  correct  decision  probabilities  are  given  by: 

P^JXgHo)  ==  F(xo-/xo) 

P^j(XgHi)  ^  F{xi  -  ^i)  -  F{xo  -  txi) 

\  (6) 

P^„(XGHn)  =  l-F(Xn-l-Mn) 

where  F  is  the  cumulative  distribution  function  with 
density  /.  The  acceptance  region  Ho  for  the  null  hy¬ 
pothesis  Ho  can  be  specified  explicitly.  For  any  given 
value  of  a  G  (0, 1],  there  exists  a  value  of  xo  that  satis¬ 
fies  the  false  alarm  constraint:  Xo  ~  F“^(l  —  a)  -f  mo. 
The  remaining  decision  boundary  values  xi,.. .  ,Xn-i 
will  be  computed  by  an  iterative  procedure. 

A  sufficient  condition  for  min-max  optimality  is  the 
equalization  of  the  correct  classification  probabilities 
P^. (decide  H^)  for  z  =  1, . . .  ,n  [2,  Corollary  2].  The 
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equalization  condition  is  represented  by  the  set  of  equa¬ 
tions 

P^. (decide  Fi)  =  p,  2  =  l,...,n  (7) 

where  p  G  (0, 1)  is  the  unknown  common  value 
of  the  correct  classification  probabilities.  Let  y  = 
[rci, . . .  denotes  matrix  transpose)  and 

define  the  function  G{y)  as  follows. 

F{xi  -  pi)  -  F{xo  -  pi)  -  p 
F{x2  -  P2)  “  F{xi  -  P2)-P 
Giy)  : 

F{Xn—l  “  /^n— 1)  ~  P{p^n—2  Mn— l)  “  V 
I  -  F{Xn-l  -  Pn)  -  V 

(8) 

Then  the  set  of  equations  (7)  is  equivalent  to 

G(y)  =  [0,...,0f  (9) 

We  propose  to  solve  (9)  iteratively  using  Newton’s  root 
finding  method.  More  specifically,  we  consider  the  se¬ 
quence  y{k)  generated  through  the  iterations 


then  for  any  starting  point  y (0)  that  is  suflSciently  close 
to  y*,  the  sequence  y{k)  generated  through  (10)  is  well- 
defined,  converges  locally  to  2/*  and  has  a  quadratic  rate 
of  convergence  with  coefficient  7^  [5,  Theorem  5.2.1]. 
Next  we  provide  a  sketch  of  the  proof  that  the  three 
conditions  are  satisfied  in  the  present  problem. 
Condition  1:  Since  f{x)  >  0  for  all  x,  the  columns  of  J 
are  linearly  independent. 

Condition  2:  The  non-zero  elements  of  the  difference 
6J  o{  two  Jacobians  evaluated  at  points  y  6y  and 
y,  respectively,  are  of  the  form  ±{f{xi  -h  6xi  —  Pj)  — 
f{xi  -  fij)).  But  f(xi  +  Sxi  -  Hj)  -  f{xi  -  fij)  = 
~  fij)dt.  Assuming  that  the  derivative  /' 
of  the  probability  density  function  /  is  bounded,  i.e. 
supa;  \  f{x)\  <  M  for  some  M  >  0,  it  follows  that 
\f{xi  -h  6xi  -  pj)  —  f{xi  —  pj)\  <  M\6xi\.  It  can  then 
be  shown  that  the  Probenius  norm  of  6J,  denoted  by 
||5J||i?  is  bounded  above  by  a  multiple  of  the  I2  norm 
of  the  vector  Since  the  /2'induced  norm  of  6J  is 
smaller  than  the  Frobenius  norm  of  SJ  [5,  Theorem 
3.1.3],  Lipschitz  continuity  is  satisfied. 

Condition  3:  For  arbitrary  z  =  [zi, . . . ,  2:^]^?  consider 
the  linear  equation 

J{y{^))y{k  +  i)  =  z  .  (13) 


yik  +  l)  =  yik)  -  J-Hy{k))Giyik))  ,  (10) 

where  J{y)  is  the  Jacobian  of  the  function  G{y),  i.e. 


d£f 


d[G(y)]i 

dyj 


(11) 


For  j  —  I, . . .  ,n  —  1,  yj  =  xj  and  yn  =  p-  Therefore, 
the  elements  in  the  first  n  —  1  columns  of  J{y)  are  found 
from  (8): 


f{xj-pj)  ,ifi=j 

-1  ,j=n 

0  ,  otherwise 


(12) 


A  few  words  about  the  convergence  of  the  iterative 
algorithm  (10)  are  in  order.  Assume  that  there  exists 
a  solution  y*  to  the  equation  (9).  If 

1.  exists  (the  Jacobian  is  invertible);  and 

2-  \\Jiy*  +  ^)  ~  *^(y*)||  <  tII^II  for  some  7  >  0 
and  for  all  sufficiently  small  perturbations  6y  {J  is 
Lipschitz  continuous);  and 

3.  Il«^”^(|/*)||  ^  for  some  /S  >  0  (the  norm  of  the 
Jacobian  inverse  is  bounded  from  above); 


For  notational  simplicity,  we  will  write  the  Jacobian 
as  J  and  suppress  its  dependence  on  y.  After  Gaus¬ 
sian  elimination,  the  equation  (13)  can  be  re-written  in 
terms  of  an  upper  triangular  matrix  J:  Jy(A;  +  1)  =  z. 
The  matrices  J  and  J  are  related  by  a  non-singular 
transformation  T,  i.e.  J  =  TJ.  It  suffices  to  establish 
an  upper  bound  on  the  Frobenius  norm  ||  of  J~^ 

because  ||  and  ||  are  related  by  ||t/“^||F  < 

||T'||i?|| and  ||T||i?  is  bounded.  Suppose  that 
the  last  column  of  J  is  the  vector  [— ui, . . . , —an]^, 
i.e.  [J]in  =  — a^,  z  =  l,...,n.  It  can  be  shown  that 
ai  =  1  and  =  1  -haf-i ^  =  2, . . .  ,n.  The 

Frobenius  norm  of  can  be  expressed  as:  ||  ||f  = 

[tr((J”^)^c7“^)]^/^,  where  “tr”  denotes  matrix  trace. 
After  some  algebra,  we  obtain  an  upper  bound: 


\\J-% 


-  (eo 


+  -h) 


1 


+ 


\i=i 


Pi^i  -  Mi)  a; 


<  {{n-l)L+iy/\ 


1/2 


(14) 


wheTeL=:maxi{{a‘f+aD/p{xi-fii)},  i= 

1.  In  finite  dimensional  spaces  all  norms  are  equivalent, 
therefore  there  exists  some  /3  >  0  such  that  ||  J||  <  /3. 


10 


3.  Applications  on  Simultaneous  Detec¬ 
tion  and  Classification  in  Gaussian 
Noise 

We  will  illustrate  the  iterative  algorithm  (10)  for  the 
case  of  normal  densities.  Let  f{x)  =  exp(—  ) 

and  /xi  =  i  for  i  =  0, 1, . . . ,  n.  We  consider  three  differ¬ 
ent  simultaneous  detection  and  estimation  rules.  One 
of  the  rules  is  the  constrained  min-max  classifier  de¬ 
scribed  earlier,  which  maximizes  the  worst  case  clas¬ 
sification  performance  under  a  given  false  alarm  con¬ 
straint.  One  can  also  perform  simultaneous  detection 
and  estimation  by  combining  a  classifier  with  a  sepa¬ 
rately  designed  detector.  With  this  strategy,  the  data 
are  not  presented  to  the  classifier  unless  the  detector 
declares  “signal  present”.  In  other  words,  the  classifier 
is  gated  by  the  detector. 

We  consider  two  gated  classifiers  and  compare  their 
performance  to  the  performance  of  the  constrained 
min-max  classifier.  Both  of  the  gated  classifiers  use 
a  min-max  optimal  detector  for  detection,  but  they 
differ  in  the  design  of  their  classifier  structures.  One 
of  them  uses  an  unconstrained  min-max  classifier  de¬ 
signed  independently  of  any  detection  objective.  An 
unconstrained  min-max  classifier  maximizes  the  worst 
case  correct  classification  probability  as  if  signal  pres¬ 
ence  is  certain.  This  classifier  is  obtained  by  remov¬ 
ing  the  false  alarm  constraint  (a  =  1)  in  the  con¬ 
strained  min-max  classifier.  The  other  gated  classifier 
uses  a  conditionally  min-max  classifier  designed  with 
explicit  knowledge  of  the  detector  decision  regions. 
A  conditionally  min-max  optimal  classifier  maximizes 
the  worst  case  correct  classification  probability  condi¬ 
tioned  on  the  detector  having  declared  signal  present. 
The  conditionally  min-max  classifier  is  obtained  by  re¬ 
placing  all  the  densities  ffnix)  under  the  alternative 
hypotheses  Hi , . . . ,  Hn  with  the  conditional  densities 
/^,(x|X  ^  Ro)  in  the  analysis  of  Section  2.  Since  we 
are  using  the  min-max  detector,  Rq  =  (— oo,xo]  as  be¬ 
fore,  and  xo  is  specified  by  the  false  alarm  probability 
a. 

Figure  1  shows  the  variation  of  the  worst  case  cor¬ 
rect  classification  probability  mini  (decide  Hj)  for 
the  three  simultaneous  detection  and  estimation  rules 
as  a  function  of  the  false  alarm  probability  a.  In  this 
example  a  =  0.6,  and  there  are  five  alternative  hy¬ 
potheses  (n  =  5).  In  general,  the  constrained  min- 
max  classifier  (solid  line)  performs  best,  while  the  un¬ 
constrained  min-max  classifier  gated  by  the  min-max 
detector  (dashed  line)  gives  rise  to  the  lowest  perfor¬ 
mance.  The  conditionally  min-max  classifier  gated  by 
the  min-max  detector  (dashdot  line),  although  bet¬ 
ter  than  the  unconstrained  min-max  classifier,  still 


Figure  1.  Worst  case  correct  classification 
probability  as  a  function  of  a. 


falls  significantly  short  of  the  performance  of  the  con¬ 
strained  min-max  classifier  for  small  a.  On  the  other 
hand,  as  a  increases  all  three  curves  come  together  as 
expected.  This  is  because  for  high  a,  the  three  simul¬ 
taneous  detection  and  estimation  rules  degenerate  to 
an  unconstrained  min-max  classifier  for  the  alternative 
hypotheses  Hi,. Hn- 
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Abstract 

A  hypothesis  H  is  parametric  if  eveiy  distribution  from 
the  process  defined  by  H  belongs  to  a  family  of 
distributions  characterized  by  a  finite  number  of 
parameters;  on  the  other  hand,  if  the  distribution  can  not  be 
definided  by  a  finite  number  of  parameters,  the  hypothesis 
is  nonparametric. 

In  this  paper,  we  analyze  a  detector  based  on  the 
optimum  permutation  test,  in  the  Neyman-Pearson  sense, 
and  under  Gaussian  noise  conditions,  which  operates  on 
radar  video  signal.  The  computational  complexity  of  the 
detector  is  high  and  its  implementation  in  real  time  is 
^Iff^^dt,  due  to  the  number  of  operations  increases  with 
the  factorial  of  the  number  of  samples.  Also,  we  present  on 
algorithm  that  reduces  the  computational  work  required. 

We  also  present  the  characteristic  of  detectability  of  the 
optimum  permutation  test  under  Gaussian  noise 
environments  and  different  types  of  target  models 
(nonfuctuating,  Swerling  I  and  SwerUng  II).  The  detection 
probability  versus  signal-to-noise  ratio  is  estimated  by 
Monte-Carlo  simulations  for  different  parameter  values  (N 
pulse,  M reference  samples  and false  alarm  probability  Pj-J. 


1,-Introdiiction. 

There  are  many  posibilities  to  solve  radar  detection 
problems  by  means  of  nonparametric  tests,  which  do  not 
have  a  global  solution.  We  are  interested  in  the  class  of 
binaiy  nonparametric  tests  called  pemuitation  tests,  which 
are  distribution-free  under  independent  and  identically 
distributed  (IID)  samples. 

i  he  distribution  ot  a  block  of  IID  samples  is  invariant 
under  the  pennutation  of  its  sample  components.  That  is, 
consider  a  IID  sample  vector  (xj,X2,...,xJ  of  n  samples  where 
Ffx)  is  the  distribution  function  of  a  sample,  if 
F(xj,X2,  . .  •  ~  FQfxf  F fxf . . F f)(xj ,  then 

F(Xj,X2, . ..,X,)  =F(X2,Xj, . ..,xj  =. . .  ^F(x,„ . . .,X2,xf 

To  generate  a  pennutation  test  the  sample  space  R”  is 
partitioned  into  n!  regions  Dj  (i=},2,...,n!)  where 


A  “  { :  ifx  eDi ,  x  (permutation)  cD)) 
in  such  a  way  that 

«! 

D^nDj^0  i,j=I,2,...,n!  if,md 


Each  sample  vector  .r  belongs  to  one  of  these  regions  D^ 
(i=I,2,...,n!),  and  we  can  get  a  different  vector  by  pennuting 
their  components,  each  one  belonging  to  one  different 
region  D^.  It  is  possible  to  partition  i?"-space  in  different 
ways  in  order  to  fulfil  D-conditions.  A  particular  case  is  the 
well  known  rank  test  [1,2,3],  whose  regions  are 


with/}6-  {l,2,...,n},  when  j J,k^l,2,...,n 

Under  the  null  hypothesis  Hq  (target  absent),  the 
probability  that  the  sample  vector  x,  belongs  to  one  of  the 
regions  is  I/n!,  i.e.  ProbfxeDJ  =  Ifni 

Under  the  alternative  hypothesis  Hj  (target  present), 
there  are  D-regions  with  more  probability  measure  than 
other  ones  and  now  the  probability  that  a:  eD^  (i^I,...,n!)  is 
not  uniform. 

Given  a  D -partition,  we  define  the  decision  region  as  the 
union  of  K  regions  D-.  In  order  to  get  the  maximum 
probability  of  detection,  we  select  the  D -regions  with 
largest  probabilities.  Just  under  the  false  alarm 
probability  Pfa  is  K/ni,  where  K  is  the  number  of  Dj-regions. 
The  optimum  peimutation  test  would  be  the  partition  that 
achieves  a  maximum  detection  probability. 

In  radar  applications,  we  have  N  sample  vectors 
Xj,X2,,.,,x^  where  N  is  the  number  of  pulses  per  antenna 
beamwidth.  Each  sample  vector  x-^  has  M  noise  reference 
samples  x^j,  j=I,2,...,M  and  the  sample  under  test  a;.,  i.e.  a;.  = 
Under  the  null  hypothesis  Hq  (target  absent) 
we  suppose  that  the  components  of  a,,  are  IID,  but  under  the 
alternative  hypothesis  Hj  (target  present)  they  are  not  IID 
(reference  samples  are  IID  and  jc,.  has  different 

distribution  of.x;^-,  j=I,2,...,M). 


0-8186-7576-4/96  $5.00  ©  1996  IEEE 
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Now,  the  distributions  associated  with  Hq  are 

N  M 

xy/4)=n[n/o,(x-p]/oA'^ 

where  X,(  )  is  the  probability  density  fimction  of  a  noise 
sample  in  the  ///?-pulse. 

Under//;,  we  have 

;^t///,)=n[n4(x.)]/,,cvp  (lb) 

i-u-l 

where  is  the  probability  density  function  of  a  sample 
under  test  .v,-  (signal  +  noise)  in  the  ///j-pulse. 

2.-Permutation  Test  Algorithm. 

In  order  to  test  against  H,  in  Neyman-Pearson  sense, 
we  take  the  likelihood  ratio 


permuting  all  the  samples  in  each  vector  x,), 

i=l,2,...,N  mA  selecting  the  upper  results  in  (4).  The  number 
of  K  higher  results  selected  depends  on  the  false  alarm 
probability  i.e.  K/(M+lf,  where  K  is  the  number 
of  D, -regions  associated  with  upper  results  of  (3)  after  doing 
pennutations. 

We  optimize  the  pemutation  test  using  (3)  or  (4)  in  the 
following  way,  fi'om  i=J  to  N  we  have  the  matrix  (for 
application  of  (4a)). 


'y 

2 

1 

2 

X] 

2 

2 

2 

2 

'^21 

^22 

■^2 

2 

2 

2 

2 

%  - 

•  '^*7M 

2 

2 

2 

't 

%/ 

%2 

•  - 

% 

/a///o) 


N  M 

n  [n/(j(x,y)]  ,(.V;)  ^  /■(  V  ) 

f-l  y-l  _ _  Jj'-'lr- 

n[n/,.(xp]4(.v:.)  ' 

.-1  j-\ 


(2) 


adding  the  elements  of  the  right  column,  we  have 

y=E'9"  (6) 

7=1 


In  case  of  Gaussian  noise  conditions  and  nonfluctuation 
target  models,  applying  (2)  at  the  output  of  linear  envelope 
detector,  we  have  (after  taking  Neperian  logaritlim): 


MIH,) 

Ln - 


7=1 


(3) 


where  S  is  the  signal-to-noise  ratio  (SNR),  and  Iq  (•)  is  the 
modified  Bessel  function  of  the  first  kind  and  order  zero. 

(a)  If  signal-to-noise  ratio  (SNR)  is  low 


(4a) 


(b)  If  SNR  is  high 


Ln - 


N 


~E|x,| 


(4b) 


Now  peiTOUting  the  components  in  each  vector  (row 
vector)  in  (5)  and  summing  by  columns,  and  ordering  these 
(M+1)^  summs  from  the  lower  to  the  upper,  we  get  the  set 
of  Wth-greatest  summs.  If  (6)  is  in  this  set,  it  is  supposed 
target  present  (hypothesis  //,);  othenvise,  it  is  supposed  that 
target  is  absent  (hypothesis  Hg ). 

An  efficient  algorithm  is  as  follows.  First,  in  (5)  we  order 
from  the  lowest  to  the  highest  the  components  of  each  row 


obtaining  the 

matrix  (7): 

2 

2 

2 

2 

^11 

Zjo 

^lA/ 

^lA^+1 

2 

2 

2 

^2) 

^2 

^2A/ 

1 

') 

2 

2 

2 

- 

2 

'f 

2 

2 

^N1 

(7) 


where  zfj 


We  optimize  the  pemiutation  test  using  (2)  or  (3),  by 
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we  get 


y 


f/+i) 

I 


(8) 


Note  that  (8)  is  the  upper  value.  Now,  swapping  z 
and  and  summing  again  the  new  right  column,  we  get 
the  next  value  and  so  on  in  order  to  obtain  the  K  upper 
values;  so  we  have  (C“A/+7,A/,- *,7)  In  each  step  we 
compared  they/^^  with  they  value  of  (6),  <y,  stop 

the  process  with  the  first  row  and  go  to  the  second  row- 
vector  of  (7).  So  in  (7)  we  swapp  z  ju  and  Zj^+i  in  order  to 
gety^^^^,  and  so  on.  We  repeat  this  algoritm  in  order  to  know 
if  they  belongs  or  not  to  the  K  uppers  values.  If  we  get  K 
upper  values  y/^^  >  y  where  (1<C<M  and  l<r<A0,  it  is 
not  necessaiy  continue  the  process,  testing  the  N  rows  and 
doing  the  M+1  swapping  in  each  row;  in  this  case  we 
supposed  that  the  target  is  absent  (hypothesis  Hq). 

3.-  Computer  results 

We  have  analyzed  the  detection  peifonnance  of 
permutation  tests  in  tenns  of  detection  probability  with 
constant  false  alam  probability  considering  (4a),  as  the 
statistic  for  the  implementation  of  algorithm  described. 

For  a  particular  target  model,  the  detection  probability 
is  a  function  of  SNR,  N,  and  A/.  We  have  considered 
/yj,=  10’^and  10  *  as  practical  radar  values.  We  present  in  the 
Figures  1,  2  and  3,  P^-cun^es  with  M  =6  and  N=10  and  12 
tor  different  types  of  targets  (Swel  ling  II,  Swelling  I  and 
nonfluctuating).  As  it  can  be  seen,  we  obtain  a  important 
variation  in  P^  for  a  low  difference  in  N-values.  Also,  it  is 
obsen^ed  that  as  decreases  then  the  diference  between  P^ 
cuiv/es  increase. 

The  Figures  4,  5  and  6  show  P^cui'ves  for  N=8  and  M= 
10  and  16.  The  variation  in  P^  with  N  is  more  important 
than  the  variation  with  M,  and  this  fact  is  because  the 
integrate  pulses  convey  more  infonnation  than  the  noise 
reference  samples.  Also,  from  Figures  3  and  6,  we  can  see 
veiy  large  differences  in  SNR  for  P^curves  ofN=10  and  12 
when  Pfo  -10  *.  More  reseach  vvorh  is  required  about  this 
fact. 

Finally,  due  to  Pj^  =  K/(M-^}f^  the  computational 
complexity  of  the  pennutation  test  algorithm  increases  with 
M  and  N  values  for  a  specific  Pyj,-values,  because  K 
increases.  Consequently  an  optimization  process  is  required 
for  the  best  detennination  of  N,  M  and  SNR  in  practical 
applications. 

Other  results  about  optimum  parametric  and  rank 
detectors  against  pennutation  test  will  be  published 
elsewhere.  Differences  up  to  1  dB  in  SNR  are  found 
between  rank  test  and  pennutation  test  for  the  same  P^,  P. , 
N  and  M. 


Fig.  1 :  Detection  Probablity  versus  Signal-to-Noisc  (SNR) 
for  permutation  test  with,  M=6,  N=10  and  N=12  with  false 
alarma  probaility  Pf,=10  *^  and  Pf,  -Iff*  ,  for  Nonfluctuating 


Fig.2:  Detection  probability  P^  versus  Signal-to-Noise  (SNR) 
for  permutation  test,  with  M=6  Jsf=10  and  N=12  with  false 
alarma  probability  P^-  10*^  and  Pf,  =  10**  ,  for  Swerling  I 
target  model. 


Fig.3:  Detection  probability  P^  versus  Signal-to-Noise  (SNR) 
for  permutation  test,  with  M=6,N=10  and  N=12  with  false 
alarm  probability  Pfa=10''^  and  Pf,  =10*  ,  for  Swerling  II 
target  model. 
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Fig.4:  Detection  probability  Pj  versus  Signal-to-Noise 
(SNR)  for  permutation  test,  with  N=8,  M  =10  and  M=i6  with 
false  alanu  probability  Pfa=l0‘®  and  P  fa  =10  *  ,  for 

Nonfluctuating  target  model. 


Fig.6:  Detection  probability  P^,  versus  Signa!-to-Noise  (SNR) 
for  permutation  test,  with  N=8,  M  =10  and  M=  16  with  false 
alarm  probability  Pfa=10  "  and  Pfa  =10'®  ,  for  Swerling  H 
target  model 


Fig.5:  Detection  probability  P^  versus  SignaUto-Noise  (SNR) 
for  permutation  test,  wiht  N=8,  M=10  and  M=16  with  false 
alarm  probability  Pfa^lO"^  and  Pfa  =10®  ,  for  Swerling  I 
target  model 
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ABSTRACT 

A  generalized  likelihood-ratio  test  (GLRT)  detector  is  de¬ 
rived  for  detecting  a  space-time  signal  in  the  presence  of 
unknown  subspace  interference  and  unknown  target  doppler. 
The  near  optimality  and  constant  false  alarm  rate  (CFAR) 
property  of  the  GLRT  is  shown  by  the  relationship  to  the 
uniformly  most  powerful  invariant  (UMPI)  test  using  a  sim¬ 
ple  approximation.  Examples  are  presented  comparing  the 
performance  of  the  proposed  detector  against  the  UMPI  test. 
The  ROC  curves  indicate  that  the  GLRT  detector  compares 
favorably  to  the  UMPI  detector, 

1.  INTRODUCTION 

We  start  by  reviewing  the  subspace  interference  model. 
Suppose  we  have  an  array  of  m  sensors  that  are  simulta¬ 
neously  sampled  at  time  ik  And  the  outputs  stacked  into 
the  vector  x(4)  =  [a:i(ffc)x2(ffc)  •  •  •  We  say  the 

interference  is  subspace  if  at  any  instant  of  time,  it  can  be 
represented  as 

r 

=  (1) 

nsl 

where  if  is  a  m  x  r  matrix  whose  columns  generate  the 
interference  space  and  Stj,  is  a  r  x  1  vector  of  scale  fac¬ 
tors.  The  data  vector  x(tk)  is  a  linear  combination  of 
the  columns  of  H,  which  remain  fixed,  i.e.,  do  not  change 
as  a  function  of  time.  The  only  dependence  on  time  is 
through  The  subspace  model  has  wide  application. 

Many  type  of  interference  components,  e.g.,  clutter,  can 
be  represented  using  a  subspace  model  (see  Scharf  [1]  for 
an  extensive  treatise  on  subspace  or  reduced-rank  model¬ 
ing  and  [2]).  For  example,  if  the  sensor  outputs  x(ik)  = 
[9il{i  -  ri)  g2l{t  -  T2)  •  •  *  gml{t  -  nn)]^  (where  7(t)  is 
some  interference  time  series)  are  time  delay  steered  to  align 
the  interference  wavefronts  (output  of  the  Jfcth  sensor  is  de¬ 
layed  by  Tk  -  ri),  then  x(ik)  =  /(tfc)[^i52  •  • where 
gk  is  the  gain  of  the  kth  sensor.  Also,  colored  noise  can  be 
modeled  as  subspace  where  the  subspace  dimension  is  pro¬ 
portional  to  bandwith  [1,  2].  Thus  narrowband  components 
can  be  represented  using  a  low  order  subspace  model.  An¬ 
other  aspect  of  the  subspace  model  is  it  inherently  accounts 
for  array  calibration  error,  eg.,  gain  errors. 


1.1.  Signal  and  interference  model 
Usually  the  received  signal  has  undergone  multipath  dis¬ 
tortion  or  time  dispersion  from  the  channel.  Therefore  the 
waveform  received  by  the  nth  sensor  is  modeled  as 

«n(<)  =  ^  CkS{—~-  ~  )  (2) 

k=l 

where  s{t)  is  the  signal  replica,  r*  is  the  time  it  takes  for 
the  signzd  to  travel  from  the  source  to  sensor  1  over  path 
k,  Brk  is  the  inter-sensor  propagation  time  delay  measured 
relative  to  sensor  1,  a  =  (c  +  t;)/(c  —  v)  is  the  contrac¬ 
tion/dilation  of  the  signal(s)  due  to  target /platform  motion 
(where  c  and  v  are  the  propagation  and  relative  target  ve¬ 
locities,  assumed  to  be  the  same  for  each  multipath),  and 
Ck  is  a  scalar  corresponding  to  the  attenuation  from  the  kth. 
path.  The  snapshot  of  sensor  outputs  at  time  tj  is  then 

L 

[s(tj7a  -  rjb/a)  s{ij/a  --  Vk/a  -  6T2/a) 

k=l 

•  •  •  3{tj/a  -  Tk/a  -  6Tm/a)  (3) 

or,  after  substituting  C*  for  the  vector  of  data  samples  due 
to  the  jbth  path, 

L 

d  (4) 

A  total  of  K  snapshots  of  data  are  collected  at  times 
{fi, <2?  •  •  • » tK")  and  stacked  into  a  matrix.  The  matrix  cor¬ 
responding  to  the  signal  component  in  the  data  is  then 

=  [si  I  S2  I  •  *  •  I  Sjc]  (5) 

and  has  the  equivalent  form 

L 

=  (6) 

where  =  [Cn  Cn  ICf]- 

Using  the  above  signal  representation,  the  received  space- 
time  data  matrices  are  modeled  as 


Wo  :  A" 

=  HB  ^  N 

(7) 

Wi  :X 

L 

=  He-\-Y^CkVt  +  N 

(8) 

Jk=:l 
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under  the  signal  absent  and  signal  present  hypotheses  re¬ 
spectively  where  9  =  [©i02  •  •  •  ©2]^*  The  elements  of  the 
background  noise  matrix  N  are  modeled  as  IID  complex 
Gaussian  distributed  with  zero-mean  and  variance  <t^  .  We 
now  discuss  the  uniformly  most  powerful  invariant  (UMPI) 
test  for  this  hypothesis  testing  problem. 

1.2.  UMPI  test 

We  want  to  test  the  hypothesis  that  Ylk  1^*1^  “  ^  (signal 
absent)  or  |ckp  >  0  (signal  present).  We  assume  that 
the  interference  subspace  H  and  doppler  a  are  known,  but 
that  the  parameters  9,  cjt,  and  in  (7)  and  (8)  are  un¬ 
known  and  deterministic,  ie.,  can  take  on  a  range  of  values. 
In  sonar  and  radar,  it  is  usually  difficult  or  impossible  to 
determine  distributions  for  the  interference  and  signal  pa¬ 
rameters  since  the  relevant  scattering  and  channel  physics 
are  usually  not  known  or  at  best,  partially  known.  This 
type  of  detection  problem  is  called  a  composite  hypothesis 
testing  problem  [8]. 

It  is  difficult  to  find  an  optimum  test  when  no  probability 
density  function  is  available  for  the  unknown  parameters 
[8,  4].  Ideally,  we  would  like  to  construct  an  uniformly  most 
powerful  (UMP)  test  [4].  A  problem  is  that  UMP  tests 
usually  do  not  exist  [8,  4]. 

In  [3]  it  is  argued  that  principles  of  invariance  should 
be  used  to  find  the  UMP  test  which  is  invariant  to  the 
unknown  nuisance  parameters  (eg.,  noise  variance,  signal 
phase),  known  as  the  UMPI  test.  The  motivation  is  that 
nuisance  parameters  are  probably  responsible  for  the  non¬ 
existence  of  the  UMP  test  in  the  first  place  [3].  Also,  in 
many  applications  the  test  should  be  invariant  to  nuisance 
parameters  such  as  the  background  noise  level,  ie.,  a  CFAR 
test.  However,  the  UMPI  test  is  also  difficult  to  find  and 
may  not  exist.  An  alternative  approach  frequently  used  is 
to  form  the  likelihood-ratio  and  replace  the  unknown  pa¬ 
rameters  by  their  maximum  likelihood  estimates  [8].  This 
is  called  the  generalized  likelihood  ratio  test  (GLRT). 

Scharf  [6]  derived  the  GLRT  for  the  related  problem  of 
detecting  in  a  single  data  snapshot  a  subspace  signal  in 
the  presence  of  subspace  interference  (when  the  subspace  is 
known)  and  showed  that  it  is  the  UMPI  test.  The  space- 
time  signal  and  interference  models  we  have  are  analogous 
to  the  data  model  used  by  Scharf  [6]  if  the  matrices  in  (7) 
and  (8)  are  vectorized  (by  stacking  the  matrix  columns  into 
a  vector).  Vectorizing  (7)  and  (8)  and  applying  the  results 
of  [6],  the  UMPI  test  is 


1 

l|P8>  XMII-  >  , 
\\pi!^'rF  ^ 


(9) 


where  the  ^ 

jection  operators  are  given  by  Pjj  —  1  —  H{H  H)  H  , 
Psi  =  5'(5'^S')“^5'^,  and  P^i  =  /  -  Ps*-  The  vectors 
are  S'  =  |  t>ec(PiX>f)  j  •  •  •  |  »cc(PiD2)]  and 

x'  =  vec(P^X).  The  scalar  A  is  some  threshold.  The 
operator  vec(-)  takes  a  matrix  and  converts  it  to  a  vector 
representation  by  stacking  the  columns.  The  numerator  of 
(9)  can  be  intrepreted  as  the  magnitude-squared  output  of  a 


space-time  matched  filter  (beamforming-matched  filter  pro¬ 
cessing)  using  as  the  replica  the  part  of  the  signal(s)  which 
remain  after  the  interference  has  been  nulled.  This  is  then 
normalized  by  an  estimate  of  the  background  noise  variance 
given  by  the  denominator  of  (9).  Because  (9)  is  invariant 
to  scalings  of  the  data  matrix  and  rotations  in  the  column 
space  of  Hf  it  is  the  best  possible  CFAR  detector. 

Unfortunately,  test  (9)  usually  can  not  be  implemented 
because  the  interference  subspace  matrix  H  is  not  known 
(e.g.,  is  a  function  of  such  things  as  the  channel,  direction 
of  arrival,  array  geometry  and  sensor  characteristics  which 
are  either  unknown  or  at  best,  partially  known)  and  the  tar¬ 
get  multipath  structure  and  doppler  are  also  unknown.  In 
previous  work  Kirsteins  [5]  proposed  a  GLRT  detector  for 
the  above  problem  given  that  doppler  is  known.  The  intent 
here  is  to  extend  those  results  to  the  case  when  doppler  is 
not  known  and  determine  the  effect  on  performance.  In  the 
remainder  of  this  paper  we  derive  the  GLRT  for  the  above 
hypothesis  testing  problem  assuming  the  interference  sub¬ 
space  H  and  target  doppler  are  both  unknown  and  then 
discuss  its  relationship  to  the  UMPI  test  and  determine 
analytically  the  effect  on  performance  when  doppler  is  un¬ 
known.  Finally,  some  numerical  examples  are  presented 
comparing  the  performances  of  the  GLRT  and  UMPI  tests. 
We  start  by  deriving  the  GLRT, 

2.  GLRT 

A  GLRT  statistic  for  choosing  between  hypotheses  (7)  and 
(8)  is 


_ minHo,eo  \\X  -  HQ9o\\]r _ 

minHi,ei.a.ci,....CL  11^  ”  “  SLi 

(10) 

where  H,  9^  a,  Cfc,  and  <t^  have  been  treated  as  unknown. 
The  GLRT  statistic  (10)  is  simply  a  ratio  of  fitting  errors. 
The  numerator  is  the  error  in  fitting  the  matrix  X  by  a  rank 
r  matrix  and  the  denominator  is  the  error  in  jointly  fitting 
X  by  a  rank  r  matrix  and 

The  numerator  in  (10)  is  easily  evaluated  using  the  sin¬ 
gular  value  decomposition  (SVD)  of  X  as  minijo,eo  11-^  ” 
=  Er=r+i^*  singular  values  of 

X.  We  need  to  evaluate  the  denominator  of  (10).  Un¬ 
fortunately,  a  direct  solution  is  not  available.  We  propose 
an  iterative  scheme  to  perform  the  minimization  based  on 
the  criss-cross  regressions  method  of  Gabriel  [7]  for  solving 
the  weighted  low  rank  approximation  problem.  Basically 
the  idea  is  to  linearize  the  optimization  problem,  for  each 
hypothesized  doppler  fl,  by  holding  H  constant  and  then 
minimizing  with  respect  to  only  9  and  the  Ck^  This  is  a 
standard  linear  least-squares  fitting  problem  and  is  easy  to 
solve.  The  procedure  is  then  repeated,  except  that  this  time 
9  is  replaced  with  its  estimate  from  the  previous  step  and 
the  minimization  now  done  with  respect  to  H  and  the  c^. 
These  steps  are  continued  until  convergence.  The  algorithm 
steps  are  summarized  below: 

a.  Initialization.  Iteration  counter  k  is  set  to  zero  fc  =  0. 

Select  initial  guess  Hq. 

b.  k  =  k-\-l 
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c.  Holding  Hk^i  fixed ^  minimize  with  respect  to  only  9  and 

Ck: 

L 

Ok.  Ck  =  arg  min  \\X  --  Hk-^iO  -  V  c*  Vi\\% 

0,ci 

*=1 

d.  Holding  9k  fixed,  minimize  with  respect  to  only  H  and 

Ck: 

L 

Hk,Ck  =  arg  min  \\X  -  H9k  -  Ck  T>k\\% 

Jfe=l 

e.  Check  if  converged.  If  not  converged,  go  back  to  step  b. 


Approximation  (11)  can  be  rewritten  as 


zi 


I 


sa  X" 


11?./ 


\\l 


(12) 


When  doppler  is  being  estimated,  the  above  approximation 
becomes 


zi 


»  l-||Ps''X"||?./l|x"||?. 

An  equivalent  test  statistic  is 

x"^Ps-  x" 


Zi  =  max 


(13) 


(14) 


The  operator  arg  here  means  the  solution  to  the  mini¬ 
mization  problem. 


Next  linearize  (14)  about  ao  by  keeping  the  first-order  terms 
of  its  Taylor  series  expansion.  This  results  in 


2.1.  Relationship  to  UMPI  test 
We  now  discuss  the  relationship  of  the  proposed  GLRT  to 
the  optimum  UMPI  test.  It  was  shown  in  [5]  that  when 
the  signal  J2k=i  background  noise  N  are  much 

weaker  than  the  subspace  interference  H9  with  doppler  a 
known,  the  GLRT  has  the  approximate  form 


Zi  —  1 


(11) 


where  Pg..  =  S"{S"^ S")-'^S"^ ,  Pii,  =  I  -  P^n, 
Pt  =  I  -  x"  =  vec{PkXPi-),  and  S"  = 

{vec{PjiVtPi-)  I  t>ec(P^P|iV-)  |  •  •  •  |  vec{P^VlPi-)  ].  The 
approximation  (11)  and  UMPI  test  (9)  are  nearly  the  same 
except  for  a  post-multiplication  of  X  and  the  signal  by  the 
projection  operator  P/.  The  post-multiplication  of  X  by 
Pq‘  corresponds  to  an  additional  temporal  nulling  of  the 
data  {Pq~  projects  onto  the  orthogonal  complement  of  the 
complex  conjugate  of  the  row  space  of  the  subspace  inter¬ 
ference  matrix  H9,  which  corresponds  to  the  time  series 
observed  by  each  sensor  due  to  this  interference).  The  ex¬ 
tra  temporal  nulling  can  be  intrepreted  as  a  loss  due  to 
estimation.  Test  (11)  is  also  CFAR  since  it  is  invariant  to 
scalings  of  the  data  matrix. 

The  distribution  of  (11)  has  been  derived  in  [5]  and  was 
shown  to  be  central  and  non-central  F  distributed  under 
Hq  and  li\  respectively. 

2.2.  Performance  degradation 

We  now  discuss  the  loss  in  performance  when  estimating 
doppler.  When  the  signal  is  present  and  not  too  weak  com¬ 
pared  to  the  background  noise  N,  we  expect  the  test  statis¬ 
tic  (10)  to  be  nearly  the  same  as  when  doppler  is  known 
(at  high  signal-to-noise  ratios  doppler  should  be  estimated 
accurately).  However,  when  the  signal  is  not  present  (noise 
only  case)  the  value  of  (10)  wiD  clearly  increase,  resulting 
in  an  increased  false  alarm  rate  for  the  same  threshold.  We 
now  determine  the  extent  of  the  increase  using  (11).  An 
exact  analysis  is  difficult  since  it  involves  determining  order 
statistics.  Here  we  present  an  approximate  analysis  using 
(11)  given  that  the  possible  range  of  dopplers  is  restricted 
to  some  small  interval  (often  times  we  know  the  feasible 
target  velocities). 


P^n 

«p _ 


Lr/Zi/v// 


+  (a-ao)  (  ^  ^ 


nH 


P^n  X- 


'\rff 


z\  fn) 

\  la=ao/ 

(15) 

If  a  is  restricted  to  some  small  interval  [ao  —  A,  ao  +  A],  the 
maximum  of  (15)  must  occur  at  one  of  the  end  points  of  the 
interval.  Therefore  the  maximum  of  (14)  is  approximately 


x"^Ps»  x" 

+  A 

d  x""Ps''x" 

x"^x" 

da  x"^x" 

a=ao 

(16) 


where  the  first  term  in  (16)  is  the  GLRT  when  a  is  known 
and  the  last  term  is  the  perturbation  due  to  estimating 
doppler. 

We  now  calculate  the  second  moment  of  the  last  term  in 
(16)  (the  first  moment  is  difficult  because  of  the  absolute 
value),  that  is,  the  expected  value  of 


e 


A" 

(x"^x")2 


da 


x"^P«»  x"| 


(17) 


Replacing  (x"^x")*  in  (17)  by  its  expected  value 
(<T*)^((m— r)^(iir— r)^+2(m  — r)(A'— r))  and  using  some  re¬ 
sults  in  [9]  for  the  moments  of  complex  Wishart  distributed 
matrices,  the  expected  value  of  e  is  found  to  be 


~  {m-r)^{K-ry+2{m-r){K-r))' 

(trace[(Apg,,  ]  +  trace^[^  Ps>> 

(18) 

where  P^^  is  obtained  applying  the  previous  formulas  using 

5"  =  [vel{U^V^Vo)  I  vec{U^V^Vo)  |  |  vec{Ui^VlVo)]  in 

place  of  5"  and  the  orthonormal  columns  of  matrices  Uo 
and  Vo  span  the  column  spaces  of  Pfj  and  Pf'  respectively. 
Discussion 

As  expected,  to  first-order  the  magnitude  of  the  pertur¬ 
bation  (relative  to  the  detector  when  doppler  is  known)  is 
related  to  the  doppler  resolution  of  the  waveform.  To  ad¬ 
just  detector  thresholds,  we  can  approximately  determine 
the  expected  value  of  the  perturbation  using  y/e  and  then 
(13)  to  determine  the  increase  of  zi. 
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3.  NUMERICAL  EXAMPLES 

In  order  to  evaluate  the  performance  of  the  GLRT  detec¬ 
tor  a  number  of  studies  were  made.  We  simulated  an  ac¬ 
tive  sonar  system  with  an  array  of  10  hydrophones  with 
half-wavelength  spacing.  The  reverberation  component 
was  modeled  as  arising  from  IID  Gaussian  point  scatterers 
(Rayleigh  distributed  amplitudes  and  uniformly  distributed 
phases)  along  a  line  perpindicular  to  the  center  of  the  array. 
The  per  sample  reverbation  power  is  normalized  to  unity. 
The  ambient  noise  component  is  modeled  as  white  Gaus¬ 
sian  with  variance  3.125  X  lO""^.  The  transmitted  pulse 
is  a  .6  second  400-425  Hz  LFM  waveform.  The  received 
target  echo  is  modeled  as  Rayleigh  fading  with  variance 
1.95  X  10“^.  In  all  simulations  the  signal  is  arriving  1/2  of 
beamwidth  from  broadside,  noting  that  the  reverberation 
is  arriving  from  broadside. 

The  target  velocity  wzis  set  at  4  m/sec.  A  total  of  200 
independent  trials  were  performed.  The  UMPI  test,  GLRT 
when  target  velocity  is  known,  and  GLRT  when  target 
doppler  is  not  known  (doppler  search  is  restricted  to  the 
interval  0-5  m/sec)  were  evaluated  for  each  trial  using  the 
same  realizations  of  interference  and  signal. 

The  measured  ROC  curves  are  plotted  in  figure  1.  Note 
that  the  unknown  target  doppler  GLRT  is  close  in  perfor¬ 
mance  to  the  GLRT  using  the  correct  target  doppler  and 
also  the  UMPI  test.  Next,  the  square  root  of  (18)  (second 
moment  of  the  increase  of  the  approximate  test  statistic 
(14))  vs.  A  is  plotted  in  figure  2  (note  that  A  =  .004  cor¬ 
responds  to  a  velocity  change  of  about  30  m/sec).  This 
is  compared  with  the  experimentally  measured  second  mo¬ 
ment.  The  plots  indicate  the  approximations  are  accurate 
over  a  wide  range. 

4.  CONCLUSION 

The  theoretical  and  experimental  analysis  indicates  that  the 
proposed  GLRT  detectors  perform  well.  Furthermore,  for¬ 
mulas  are  provided  relating  the  GLRT’s  to  the  UMPI  test 
and  allowing  the  approximate  calculation  of  the  expected 
increase  of  the  test  statistic  when  doppler  is  estimated. 
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Figure  1.  Experimentally  measured  ROC  curves  based  on 
200  trials.  The  curves  are  labeled  as  follows:  solid  -  UMPI, 
widely  dotted  -  GLRT  using  true  doppler,  dashed  -  GLRT 
when  doppler  is  unknown. 


Figure  2.  Expected  increase  of  test  statistic  (14).  Solid  line 
is  theoretical  and  ♦  is  experimental. 
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ABSTRACT 

Muliplicative  jumps  have  been  considered  in  many  ap¬ 
plications.  These  applications  include  speckle  signal  in 
radar  images,  mechanical  vibrations,  non-linear  time 
series  and  random  communication  models.  The  prob¬ 
lem  addressed  here  is  the  detection  of  multiplicative 
jumps  using  the  Neyman-Pearson  test.  This  test  con¬ 
stitutes  a  reference  to  which  suboptimal  detectors  can 
be  compared.  In  practical  applications,  the  parameters 
of  the  noise  and  of  the  jump  have  to  be  estimated.  The 
Maximum  Likelihood  Estimator  and  the  Cramer  Rao 
bound  for  these  parameters  are  then  studied. 


Under  hypothesis  the  process  x{n)  is  multi¬ 
plied  by  a  step  of  amphtude  A  at  time  no: 

y(n)  =  x{n)  [1  +  A,U  (n  -  no)] 

where  U (n)  is  the  Heaveside  step.  The  Neyman-Pearson 
test  is  then  defined  by: 

Ho  rejected  if  >  HPnd)  (1) 

ij[r  \Ho) 

In  (1),  L  (y  )  is  the  Likelihood  function  for  the  vec¬ 
tor  Y  —  [2/(l),...,2/(iV')]*  under  hypothesis  Hi.  Using 
the  normality  of  vector  Y,  Hq  is  rejected  if: 


1.  INTRODUCTION 

This  paper  studies  the  performance  of  a  multiplicative 
jump  detector  based  on  the  Neyman-Pearson  test.  For 
the  sake  of  simplicity,  we  consider  the  case  of  a  shifted 
step  embedded  in  a  multiplicative  non  zero  mean  white 
Gaussian  process,  which  leads  to  simultaneous  mean 
value  and  variance  jumps.  This  kind  of  signal  has 
been  considered  in  many  apphcations.  These  applica¬ 
tions  include  speckle  signal  on  piecewise  constant  back¬ 
grounds  in  radar  images,  mechanical  vibrations,  non¬ 
linear  time  series  and  random  communication  models. 
In  the  first  section,  we  formulate  the  problem  and  de¬ 
velop  the  optimal  Neyman-Pearson  test  [1].  This  test  is 
optimal  in  the  sense  that  it  minimizes  the  probability  of 
false  alarm  (Pfa)  for  fixed  probability  of  non  detection 
(Pnd).  The  second  section  is  devoted  to  the  estimation 
of  the  multiplicative  jump  parameters  which  leads  to  a 
suboptimal  detector. 


N 

i—noA'l 


<  S{Pnd) 


(2) 


Introducing  the  unit  normal  n-dimensional  variable  W  = 
W  =  ^^^^^^nndevHo  (3) 

and 

W  =  ^  under  Hi 

<T 

we  can  express  Z  as  the  sum  of  N  —  no  independent 
and  identically  distributed  (i.i.d.)  variables: 

N 

Z  =:  dj  (^(^)  +  under  Hj  (4) 


with 


2.  NEYMAN  PEARSON  TEST 

The  problem  addressed  here  is  the  detection  of  multi¬ 
plicative  jumps  using  the  Neyman-Pearson  test. 

Under  hypothesis  i/i,  the  signal  is  a  stationary 
white  Gaussian  process  x{n)  with  mean  m  and  variance 
cr^. 


Mo  =  ^  (1  +  under  Ho  (5) 

and 

=  a'(2  +  A)’^^  under  ifi  (6) 

Eq.  4  shows  that,  under  hypothesis  Hj,  the  distribu¬ 
tion  of  Z/dj  is  a  non-central  distribution  with  N—uq 
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degrees  of  freedom  and  with  non-centrality  parameter 
Xj  =  {N-  no)  Mj  [4].  The  probabilities  of  false  alarm 
and  of  non  detection  can  then  be  expressed  as  functions 
of  the  cumulative  distribution  function  of  a  non-central 
distribution  : 


rS(Pnd)/do 

(7) 

Pnd  = 

j 

L 

r+oo 

fo  (t)  dt 

Pfa  = 

Js{Pnd)/di 

fi  (t)  dt 

(8) 

In  these  equations,  fj{t)  denotes  the  probability  den¬ 
sity  function  of  the  distribution  with  N -no  degrees 
of  freedom  and  with  non-centrality  parameter  Xj  = 
{N  —  no)  Mj,  As  an  example,  we  consider  N  =  2048 
samples  of  a  Gaussian  distributed  random  sequence 
with  m  =  1  and  =  1.  The  multiplicative  jump 
occurs  at  time  no  =  1024.  The  variations  of  pfa  and 
pnd  as  functions  of  the  threshold  S  are  plotted  in  Fig. 
1  for  different  jump  amplitude  A'. 


Figure  1:  Pfa  and  Pnd  as  functions  of  the  threshold  S 
(Dashed  Line:  A  =  0.05,  Continuous  Line:  A  =  0.01). 

As  it  can  be  seen,  the  Neyman-Pearson  test  shows 
good  performance.  This  test  constitutes  a  reference  to 
which  suboptimal  detectors  can  be  compared.  To  study 
the  sensitivity  of  the  test  as  a  function  of  the  jump 
amplitude  A,  we  have  plotted  in  Fig.  2  the  variations 
of  the  probability  of  false  alarm  as  a  function  of  A  for  a 
fixed  probability  of  non  detection  {Pnd  =  0.01).  As  can 
be  seen,  a  multiplicative  jump  with  amplitude  A  >  0.1 
can  be  detected  with  low  probabilities  of  non  detection 
and  of  false  alarm.  However,  in  practical  applications, 
the  parameters  m,  A  and  no  are  unknown  and  have 
to  be  estimated.  In  the  next  part  of  the  paper,  we 
derive  the  Maximum  Likelihood  Estimator  (MLE)  and 
the  Cramer  Rao  Bound  for  these  parameters. 


Jump  Amplitude  A 


Figure  2:  Probability  of  False  alarm  as  a  function  of  the 
jump  amplitude  for  fixed  probabiUty  of  non  detection 


Pnd  =  0.01. 


3.  MAXIMUM  LIKELIHOOD  ESTIMATOR 


The  Maximum  Likehhood  principle  [1]  provides  a  method 
to  estimate  the  parameter  vector  0  =  (m,cr^,  A,no  ) 
from  a  finite  length  data  record  Y  =  (yi, ...,  VnY •  When 
a  jump  occurs  at  time  no,  the  likelihood  function  of  the 
Caussian  vector  Y  is  defined  by: 


L{Y;0) 


1 


(27ra2)^(l  +  A)^-"" 


with  di  =  1  for  i  €  no}  —  1  +  for 

i  e  {no  +  1, iV}.  The  MLE  of  the  vector  6  denoted 
by  6ml  is  the  one  which  maximizes  the  likelihood  func¬ 
tion  over  a  subset  0  of  x  E  with  E  =  {1, 

When  the  vector  {A,  tiqY  is  known,  the  MLE  of  (m,  cr^) 
is  obtained  by  setting  to  0  the  partial  derivatives  of 
L{Y‘,6)  with  respect  to  m  and  <7^: 

1  ^Yi 
1=1 

t=l  '  ' 

When  the  vector  {A,noY  is  unknown,  we  substitute 
the  expression  of  niML  and  ^ml  in  (9)  and  drop  the 
constant  terms.  We  need  now  to  maximize: 

Ji  (T;  A,  no)  =  -  (iV  -  no)  Ln  ll-h  A| 

i=sl  ^ 


(10) 

(11) 
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with: 

f{Y-,A,no)  =  ^'£j  (13) 

Setting  to  0  the  derivative  of  Ji  (1^;  -4,  no)  with  respect 
to  A  leads  to  a  second  degree  equation  with  respect  to 
A.  The  solution  of  this  equation  gives  us  an  expression 
of  ^  as  a  function  of  the  jump  time  no  and  of  the  obser¬ 
vations  yi  denoted  by  J2  (y;no).  When  no  is  known, 
the  MLE  of  A  is  then  given  by: 

Aml  =  J2  (^;  ^0)  (14) 


When  no  is  unknown,  its  MLE  is  obtained  by  the  ar¬ 
gument  of  the  maximum  of  the  criterion  J3  (T;  no)  = 

Ji  ^y;^jv/x,,noj  such  that: 

J3  (Y ;  noM l)  =  Max  J3  {Y ;  n)  (15) 


In  other  words,  the  maximization  of  L{Y\6)  over  the 
whole  parameter  vector  6  is  equivalent  to  the  maxi¬ 
mization  of  J3(y;no)  with  respect  to  no  only.  The 
MLE  of  m,  and  A  are  then  given  by  replacing  no 
t>y  ill  (10  )»  (11)  ^-nd  (14).  Note  that  the  max¬ 
imization  of  J3  (Y ;  no)  with  respect  to  no  is  a  discrete 
maximization  which  is  very  simple  to  implement.  The 
mean  and  standard  deviation  of  the  MLE  are  shown 
in  Fig.  2. a),  6)  ,c)  and  d)  and  compared  to  the  true 
parameters  m  =  l,a^  =  1,^  =  0.5  and  no  =  ^  for 
different  number  of  samples  N. 


Fig.  2.  Mean  and  Standard  Deviation  of  the  estimated 
parameters  for  different  numbers  of  samples  N 
(a)  Aml  (b)  tuml  (c)  (d) 


4.  CRAMER  RAO  BOUND 


It  is  well  known  that  the  covariance  matrix  of  any  unbi¬ 
ased  estimator  cannot  be  smaller  than  the  inverse  of  the 
Fisher  information  matrix  known  as  the  Cramer-Rao 
bound  (CRB).  For  a  parameter  vector  Q  =  (0i,  ...,0p)*, 
the  elements  of  the  Fisher  information  matrix  are  given 
by: 


d^\iiL{Y^,e)\ 
de^dOi  ] 


1  <  fc,  Z  <  p 


(16) 

where  L{y\6)  is  the  probability  density  function  of 
the  vector  y  =  (2/1,  ...,2/iv)*-  For  Gaussian  time  series, 
many  equivalent  expressions  for  the  Fisher  information 
matrix  can  be  found  in  the  literature  [3].  For  instance, 
we  have: 


where  tr{.}  denotes  the  trace  operator  and  i?  (0) ,  m  (0) 
are  the  covariance  matrix  and  the  mean  of  the  vector 

r  =  (yl,...,y^^)^ 
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When  no  is  known,  using  eq.  (16)  or  (17),  the 
Fisher’s  information  matrix  corresponding  to  the 
vector  a  =  [m,  A)*  can  be  computed.  The  following 

results  can  be  obtained; 


I  &  « 

0 

m(N-‘no)  N—tiq  _ 
(1+A)<t-^  jr+Ap^ 


m(N—no) 

N-nn 


(iW 


[^+2 


The  determinant  of  this  matrix  is  of  the  form  det  (/„)  = 
Cno  (IV  -  no)  with  C  >  0.  Thus,  when  the  jump  occurs 
at  time  no  =  N  or  tiq  =  0,  /«  is  singular  and  we  cannot 
estimate  A  with  (14). 

When  no  is  unknown,  the  parameter  vector  is 
0  =  (m,<j2.  A,  no)‘.  The  problem  of  hnding  a  bound 
for  the  covariance  matrix  of  0ml  becomes  difficult  be¬ 
cause  no  is  a  discrete  parameter  (belonging  to  the  set 
{1, ...,  iV}).  If  we  consider  that  the  jump  occurs  at  time 
to  €  [0,T],  the  MLE  of  to  is  obtained  by  maximizing 
Jo  (V;  no  -1- 1),  no  =  int{to)  being  the  integer  part  of  to. 
In  this  case,  we  cannot  derivate  the  likelihood  function 
with  respect  to  to  which  prevents  us  to  compute  the 
terms  [/e]j  4  of  the  Fisher’s  information  matrix.  Thus, 
the  CRB  for  the  vector  0  cannot  be  computed. 

For  a  known  parameter  no,  a  comparison  between 
the  mean  square  error  (MSB)  of  0ml  estimated  with 
Nr  -  500  Monte-Carlo  runs  and  the  CRB  is  presented 
in  Fig.  3.  In  this  figure,  the  MSB  of  0ml  in  lb®  nf 
known  and  unknown  parameter  no  is  also  compared. 


5.  CONCLUSION 

The  optimal  Neyman-Pearson  multiplicative  jump  de¬ 
tector  is  derived.  For  fixed  probability  of  non  detection, 
the  threshold  minimizing  the  probability  of  false  alarm 
can  be  determined.  This  test  constitutes  a  reference 
to  which  suboptimal  detectors  can  be  compared.  In 
practical  applications,  the  parameters  of  the  noise  and 
of  the  jump  are  unknown.  The  Maximum  Likelihood 
Estimator  and  the  Cramer  Rao  Bound  for  these  pa¬ 
rameters  are  then  studied.  The  next  step  in  our  study 
will  be  to  compare  our  results  with  a  wavelet  based 
detection  strategy  [2]. 
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Abstract 

The  problem  treated  in  this  paper  is  the  Bayesian  es¬ 
timation  of  the  variance  of  the  sampling  jitter  occuring 
when  a  process  is  irregularly  observed.  This  problem 
is  often  met  in  practice  [2],  and  has  already  received 
treatment  in  [1][5]  using  higher  order  statistics.  The 
Bayesian  solution  to  this  problem  is  performed  using 
powerful  stochastic  algorithms,  the  MCMC  (  Markov 
Chain  Monte  Carlo)  methods. 


1.  Statement  of  the  problem 

1.1.  Motivations 

The  problem  addressed  in  this  paper  is  the  estima¬ 
tion  of  the  variance  of  the  jitter  occuring  while  sampling 
irregularly  a  process  whose  a  priori  density  is  known. 
This  problem  has  already  received  treatment  in  [5] 
for  example,  and  [1]  in  the  case  of  a  Gaussian  pro¬ 
cess,  using  higher  order  statistics:  the  second  method 
is  mainly  based  on  the  fact  that  a  continuous  Gaus¬ 
sian  process  does  not  give  birth  to  a  discrete  Gaussian 
process  when  irregularly  observed.  In  this  paper  we 
propose  a  Bayesian  statistical  approach  for  estimating 
this  quantity,  in  a  wider  framework,  because  we  remove 
the  Gaussian  assumption. 

The  main  interests  of  the  approach  we  develope  in 
this  paper  are  that: 

-  it  does  not  require  a  lot  of  observations  (as  in  the 
case  of  higher  order  statistics), 

-  we  remove  the  assumption  of  Gaussianity  of  the 
continuous  process  which  is  sampled, 

-  we  estimate  the  a  posteriori  probability  density  of 
the  Jitter,  (thus  allowing  the  calculus  of  conditional  ex¬ 
pectations,  confidence  intervals...) 

-  we  use  stochastic  algorithms,  the  MCMCs  (Markov 
Chain  Monte  Carlo),  which  have  been  very  popular  for 
fifty  years  in  statistical  physics,  and  more  recently  in 


image  processing  and  statistics,  but  are  not  yet  popular 
in  signal  processing  despite  their  power. 

1.2.  Assumptions-Notations 

-  ^  continuous  time  process, 

-  this  process  is  sampled  at  times: 

tn  =  n-\-en  (1) 

where  the  (€^n)n€S  mean  iid,  and  we  note  pn  = 

x{tn). 

-  is  a  useful  notation  for  (j^n)„~i  jv- 

2.  Bayesian  solution  to  the  problem 

We  wish  to  estimate  the  following  density: 

p(<t/?/i^jv)  (2) 

where  a  is  the  variance  of  the  sampling  jitter,  and 
yi — yjsf  the  observations. 

Remark  1  it  is  worth  noticing  that  in  this  paper,  we 
restrict  ourselves  to  cr  as  an  unknown  parameter  of 
the  distribution  of  the  time  perturbations.  Obviously 
cr  could  be  replaced  in  all  what  follows  by  the  complete 
finite  dimensional  parameter  vector  0,  characterizing 
the  distribution. 

This  problem  can  be  thought  as  a  missing  data  prob¬ 
lem,  where  the  yi — are  the  incomplete  data,  which 
can  be  completed  by  the  to  form  the  complete 

data  set  2/i->jV)  fV* 

We  thus  use  a  stochastic  algorithm  that  allows  us  to 
estimate  the  following  joint  density: 

P  (<T,  ei^N  /yi^N )  (3) 

and  thus  the  required  density,  p  [a  /yi _ ). 

A  natural  method  would  be  here  to  use  the  Gibbs 
sampler  [4],  which  consists  in  drawing  iteratively  and 
alternatively  subsets  of  the  parameters,  according  to 
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others,  thus  building  a  chain  of  samples.  Under  mild 
conditions  [4],  the  joint  density  of  the  samples  drawn  as 
described  above  will  converge  to  p{(T^Si—^n  /Vi-^n) 
thus  providing  a  representation  of  this  joint  density. 

In  our  case,  sampling  from  the  Gibbs  sampler 
amounts  to  draw  iteratively  and  alternatively  with  the 
following  laws: 

p(cr/yi— ►AT )  (4) 

p(£l_^.JV  /(J,  yi^N) 

which  using  the  Bayes’s  rule  yields: 

p(cT  lyi — — )>jv)  oc  p{ei — yNf<^)p{<y)  (^) 

p(ei_^jv  /a,yi^N)  oc  p{y\^N  /ei^N)p{^i^N  /<r) 

Where: 

**  oc  means  here  ”  is  proportional  to” , 

-  we  note  that  p  (^i — yn  2/1 — vN )  oc  p  (^i — ^.jv  /cr)^ 
and  that  p{y\ — vJv/cr,  £1 — yf^)  — 

p(yi_^.iv  )■ 

Remark  2  it  is  worth  noticing  that  without  any  addi¬ 
tional  difficulty  it  is  possible  to  suppose  that  the  process 
is  embedded  in  noise^  thus  taking  into  account  thermal 
noise^  quantification  noise,  which  was  not  so  simple 
with  previous  methods.  In  that  case,  one  would  only 
write: 

piyi^N  /<r,ei^N^xi^N  )  (xp{xi-yN  lyi^N  ) 

xp{yi~^N  /si—+n) 

p{<^ /yi — — yN  )  =  p(^  /yi — >-N,  £1 — yN  ) 

p{ei - i-N  l<r,  X\ - yN^  yi - yN  )  oc  p  (xi - yjv  lyi - yN  ) 

Xp(yi — ►AT  /ei— ) 

xp(ei— ►jv  /cr)  (6) 

where  the  xi _ yN  the  observations  including  noise, 

which  is  assumed  to  be  stationary  for  sake  of  convini- 
ence.  We  notice  that  it  requires  to  be  able  to  sample 
from  p(yi_).jv  /si — yN ). 

The  simulation  algorithm  could  thus  be  summarized 
as  follows  [4] : 

1.  Draw  according  to  p  ^ yi__^iv^,  j  . 

2.  Draw  according  to 

3-  Go  to  1. 

Nevertheless  in  practice  it  is  generaly  impossible  to 
perform  such  an  algorithm  directly: 

-  in  most  cases,  it  is  impossible  to  apply  an  ac¬ 
cept/reject  procedure  [4]  using  (5),  as  it  would  require 


to  determine  an  efficient  generating  density,  so  as  to 
ensure  a  good  acceptation  rate,  and  to  evaluate  analyt- 
icaly  the  normalization  constant  of  the  likelihood. 

-  in  most  cases,  one  can  not  determine  prior  conjug¬ 
ate  densities,  which  would  simplify  the  drawing  proced¬ 
ure.  For  example  in  the  case  where  the  Si—^yN  are  iid 
Gaussian,  one  could  choose  cr^  '^Inv— 

-  due  to  the  limited  precision  of  computers  small 
values  are  rounded  to  zero,  and  the  algorithm  may  not 
converge.  This  often  happens  when  we  deal  with  joint 
densities  of  large  size  vectors. 

In  order  to  circumvent  all  those  problems,  we 
propose  a  combination  of  two  MCMC,  more  pre¬ 
cisely  a  product  of  two  Metropolis-Hastings  (M-H) 
kernels  [4],  whose  respective  invariant  densities  are 
the  full  conditional  densities,  p{(T /yi—^N,ei-^N)  and 
p{ei—^N  This  algorithm  is  no  more  a 

Gibbs  sampler  (which  nevertheless  is  a  particular  case 
of  product  of  specific  M-H  kernels) .  We  use  M-H  based 
on  random  walk  [4]  (ie  a  simple  Metropolis  algorithm) , 
that  is  we  make  the  parameters  evolve  with  random  in¬ 
crements:  in  what  will  follow,  the  scalar  increment  for 
a  will  be  named  2:  and  will  be  distributed  according 
to  qz  which  must  be  symetric  [4].  The  vector  incre¬ 
ment  for  Si _ yN  will  be  Ui — yN^  distributed  according 

to  qu  which  must  be  also  symetric.  (We  notice  that 
we  could  make  the  'y'  depend  on  the  proceeding  step). 
This  provides  a  general  algorithm  which  is  ensured  to 
work  in  all  situations  (under  convergence  conditions), 
but  which  may  not  be  computationaly  efficient  in  par¬ 
ticular  cases,  and  thus  would  have  to  be  adapted. 

The  algorithm  can  thus  be  summarized  as  follows: 


1.  Draw  <7*  according  to  p(<t)  and  conditionally  to 

a*. 


2.  while(i<M)< 


(Initialization:  i  =  0) 

2  =  2  +  1 

Draw  0*  according  to  qz 
Set  orj  =  crj"  +  2* 
Calculate  a 


1, 


Accept  the  value  (tJ  with  probability  a. 

If  ^72  is  accepted  then  set 

• 

<7i  —  <72  . 


3. 

4.  while(i<M') 
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(Initialization:  i  =  0) 

2  =  z  +  1 

Draw  according  to  qu 

— i^N  + 

Calculate  a 


Accept  with  probability  a. 

If  is  accepted  then  set 

Cl — •—  — )-N 


5  ^(^+1)  „• 


6.  go  to  2. 


One  of  the  advantages  of  using  the  M-H  kernels,  is 
that  ratios  of  densities  appear,  which  allow  to  avoid  the 
numerical  problems  discussed  above. 

Remark  3  one  notices  that  in  the  case  where  M,  M' 
are  sufficiently  high,  so  that  the  stationary  regime  is 
reached,  this  would  lead  to  a  Gibbs  sampler.  Neverthe¬ 
less  as  it  will  be  shown  in  the  next  section,  this  is  not 
required,  but  the  algorithm  obtained  in  such  a  case  is 
no  more  a  Gibbs  sampler. 


3.  Convergence  of  the  algorithm 


We  give  suflBcient  conditions  to  ensure  convergence 
of  this  stochastic  algorithm  using  Markov  chains  theory. 
Let  E  be  the  state  space  of  the  Markov  chain.  We 
assume  E  ^  Ei  x  E2  is  an  open  connected  subset  of 
M^xM  .  Furthermore  we  use  xi,  yi  to  denote  elements 
of  El  and  x*2,  y2  to  denote  elements  of  £’2-  Let  T  = 

X  ^2  be  the  Borel  cr-field  on  E. 

We  make  the  following  assumptions: 

“  TT  (dar)  admits  a  stricly  positive  density  ix  [dx)  on 
E  with  respect  to  the  Lebesgue  measure. 

“  7ri|2  (da^il  X2)  and  T^2\i[d^2\xi)  admit  densities 
7ri|2  (^i|  ^^2)  and  7r2ji  ( a:2|  ari)  on  their  space  with  re¬ 
spect  to  the  Lebesgue  measure. 

-  Vt/1  G  El  q{xi,yi\x2)  (resp.  Vx2  G  E2 
q{^2,y2\yi))  is  stricly  positive  on  Ei  x  F;’i(resp.  E2  x 
E2) 

The  transition  probability  kernel  P  :  x  ,7^  ->  [0, 1] 
is  in  our  case  defined  as  follows.  Let  us  first  consider 
the  ’local’  transition  kernels: 


Pi  {xi,dyi\x2)  =  (3:1,  yi|  3:2)0  (a:i,  yi|  3:2)  dyi 


+ 


h/ 


q{3:i,y),\x2)a{xi,yi\x2)dyi 


(dyi) 


r(j;i/a:2) 


where  ct  (xi,yil  X2)  is  the  probability  of  ac¬ 
cepting  the  candidate  yi  sampled  accord¬ 
ing  to  q(xi,yilx2).  In  a  zero-mean  random 


walk  setup  a(xi,y:lx2)  =min  ,  l)  if 

^i|2  (^i|  ^2)  ^  2/i|  ^2)  >  0  and  1  elsewhere. 

Similarly 


P2  (a:2,dy2|  yi)  =p(x2,y2lyi)dy2  -f- r  (X2I  yijSa;^  (dy2) 
The  total  transition  kernel  is  thus  equal  to 


P((xi,X2),(AuA2)) 


Pi  (3;i,dyi|  3:2)  P2  (a:2,dy2|yi) 


To  establish  that  this  Markov  chain  converges  towards 
the  required  posterior  density  7r(a:i,a:2),  it  is  sufficient 
to  show  that  P  admits  tt  as  invariant  density  and  that 
P  is  TT-irreducible  and  aperiodic. 

-  Under  the  above  assumptions,  V(a:,A)  G  E  x  E 
P  {XjA)  >  0,  thus  P  is  TT-irreducible  and  aperiodic. 

-  TT  is  invariant  for  P,  Indeed, 

J  J Pi  {xi,dy i\x2)  P2  ix2,dy2\yi)  n  {xux2)dxidx2 
= /P2  {x2,dy2\yi) 

x[/^i  (a;i,c?yi|a;2)jr(a;i|a:2)da;i]  7r(x2)dx2 
=  JP^  (a:2,dy2|  yi)7ri|2  (dyi|  a:2)da:2 

-  J  P2(X2,dy2\yi)  - * - - TT  {X2)  dx2 

=  TTi  (dyi)  f  P2(x2,dy2\yi)n2\i  {x2\ yi)dx2 
=  TTi  (dyi)  7:2(1  (dy2|yi)  =  7r(dyi,dy2) 

From  TT-irreducibility  and  aperiodicity  one  deduces 
that, 

IIP-  (3:,.)-;r||^0V3:GP 

where  ||.||  is  the  total  variation  norm.  On  the  contrary 
of  what  is  claimed  in  many  papers,  one  can  not  con¬ 
clude  about  ergodicity  of  the  Markov  chain,  because  it 
would  require  to  establish  in  addition  that  P  is  Harris 
reccurent.  In  many  cases  -  when  there  is  no  measure- 
theoretic  pathology  -  irreducibility  implies  Harris  re- 
curence.  However  there  exists  yet  no  general  results  on 
the  convergence  of  hybrid  samplers,  and  we  have  not 
been  able  to  establish  this  property  rigorously.  If  that 
property  was  true,  then  this  Markov  chain  would  be  er- 
godic,  i.e.  for  any  real-valued  7r~integrable  function  / 
one  would  have 


k=i  '' 


/  (3:)  TT  (dx)  almost  surely 


4*  Simulation 


In  order  to  compare  the  performance  of  our  method 
with  performance  of  previous  methods  we  have  chosen 
to  follow  [1].  The  continuous  process  is  thus  assumed 
to  be  Gaussian,  but  we  notice  that  normality  is  not 
required  to  perform  the  algorithm.  One  must  only  be 
able  to  evaluate  p  (yi _ Ai _ j-iv  )• 


26 


-  The  correlation  matrix  of  the  Gaussian  process  is: 

where  the  are  the  sampling  times. 

-  The  perturbations  are  iid  centered  normal, 
(in  the  simulation  M  (0,  .07))  restricted  to  [-.5,  +.5]  in 
order  to  place  ourselves  in  a  case  similar  to  [1]  and  to 
ensure  the  convergence  of  the  algorithm. 

-  (7  has  a  noninformative  prior,  ie  is  distributed  ac¬ 
cording  to  the  uniform  law  on  [0,  .3]. 

-  the  increment  laws  are  A/"  (0,  .3)  and  multivariate 

A7(0,  .3/iv). 

-  the  number  of  iterations  for  both  Metropolis- 
Hastings  algorithms  are  200. 

-  we  performed  the  algorithm  in  two  cases:  50  and 
100  data  available. 

The  results  are  good,  and  even  better  than  those 
obtained  by  [1]  althought  a  small  amount  of  data  was 
available. 


With  50  data 


5.  Conclusion 

In  our  paper,  we  propose  an  original  and  eflBcient 
solution  to  a  problem  which  is  of  interest  in  many  ap¬ 
plications,  when  data  are  not  sampled  at  regular  in¬ 
tervals.  This  is  a  typical  missing  data  problem  that 
we  have  solved  in  a  Bayesian  framework  using  MCMC: 
this  avoids  complex  expectation  evalution  and  often  un- 
tractable  global  optimisation  encountered  when  using 
the  E-M  algorithm  and  related  versions.  Our  solution 
allows  one  to  estimate  not  only  the  density  of  the  vari¬ 
ance  of  the  jitter,  but  also  the  densities  of  each  of  the 


With  100  data 


perturbation.  We  believe  that  the  procedure  is  suffi¬ 
ciently  flexible  to  be  applied  to  many  situations  met 
in  practice.  Of  course,  MCMC  techniques  have  many 
other  applications  in  statistical  signal  processing,  see 

[3]  for  an  application  to  deconvolution. 
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Abstract 

We  address  the  problems  of  modeling  Doppler-shifted 
wide-band  Gaussian  random  processes  and  of  estimating 
the  Doppler  parameter  from  a  finite  series  of  discrete-time 
samples.  Relations  between  the  continuous-time  process, 
the  Doppler  shift  parameter,  and  the  discrete-time  process 
obtained  by  sampling  are  established.  Approximate  ratio¬ 
nal  models  are  proposed.  Various  estimators  are  proposed 
for  Doppler  parameter  when  the  second-order  statistics  of 
the  original  continuous-time  random  process  are  known. 
The  Cramer-Rao  bound  is  derived.  The  estimators  are  com¬ 
pared  experimentally  on  synthetic  Doppler-shifted  data.  We 
also  hint  at  some  extensions  of  the  method  to  non-stationary 
processes  and  time-varying  Doppler  shifts. 


1.  Introduction 

The  Doppler-shift  effect  is  well-known  for  narrow-band 
signals  emitted  by  moving  sources  (Fig.  1).  In  that  case, 
freshman’s  physics  tells  us  that  a  harmonic  wave  of  fre¬ 
quency  cuo  emitted  by  a  point  source  moving  toward  a  fixed 
receiver  with  speed  v  is  observed  with  apparent  frequency 
u;  “  (1  -l-  M)u)q,  where  M  =  v/c\^  the  Mach  number.  In 
this  paper,  we  consider  wide-band  moving  sources  that  are 
modeled  by  continuous-time  Gaussian  random  processes 
with  known  statistics.  We  address  the  problems  involved  in 
estimating  the  Doppler  shift  from  samples  of  signals  emit¬ 
ted  by  such  sources.  The  estimators  proposed  here  are  mo¬ 
tivated  by  applications  in  acoustics,  e.g.,  in  environmental 
sound  monitoring,  where  wide-band  moving  sources  are  of¬ 
ten  encountered.  For  example,  the  maximum-likelihood  es¬ 
timator,  or  one  of  its  approximations,  can  be  used  in  a  GLRT 
for  detection  and  classification  of  Doppler-shifted  wide¬ 
band  processes  given  a  dictionary  of  possible  non-shifted 
spectra. 
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Figure  1 .  Source  moving  toward  the  receiver. 


2.  Mathematical  formulation 

Let  Xc{t)  be  a  continuous-time  Gaussian  zero-mean  sta¬ 
tionary  random  process  modeling  the  signal  emitted  by  the 
moving  point  source  S  and  let  yftt)  be  the  signal  observed 
at  the  fixed  receiver  O  (see.  Fig.  1).  For  simplicity,  we 
consider  the  one-dimensional  case  of  a  source  moving  at  a 
constant  speed  v  toward  O.  It  can  be  shown  that  in  the  far 
field  yc{t)  is  related  to  Xc{t)  by  [5] 

yc{t)  «  (TXc{at  ~~  6),  (1) 

where  a  —  1  -1-  u/c  is  the  Doppler  shift  factor,  cr  is  some 
attenuation  factor  due  to  the  propagation  of  the  acoustic 
wave,  (5  is  a  propagation  delay,  and  c  is  the  wave  propa¬ 
gation  speed.  Equation  (1)  is  also  valid  for  sources  moving 
away  from  the  receiver:  in  that  case  t;  <  0  and,  conse¬ 
quently,  a  <  1.  It  is  straightforward  to  see  that  ydt)  is 
also  a  Gaussian  zero-mean  stationary  random  process  with 
covariance  function 

RyA^)  =  E[yc{t)yc{t  -  t)]  =  (2) 

where  Rxd't^)  denotes  the  covariances  function  of  the  sta¬ 
tionary  process  Xc{t).  Equivalently,  the  power  spectral  den¬ 
sities  (PSD)  of  the  processes  Xc{t)  and  ydt)  are  related  by 

(3) 

a  a 

where  Sxftuj)  and  Syduj)  denote  the  PSD  of  Xc{t)  and 
ydt),  respectively. 

In  practice,  we  only  have  access  to  sampled  versions, 
a;[n]  =  xdnTs)  and  y[n]  =  ydnTs  +  ac),  n  G  N,  of  the 
continuous-time  processes  xdt)  and  ydt)\  is  some  arbi¬ 
trary  time  instant,  and  =  27r/T5  is  the  radian  sampling 


0-8186-7576-4/96  $5.00  ©  1996  IEEE 


28 


frequency.  Both  discrete-time  random  processes  are  Gaus¬ 
sian  and  zero-mean,  with  covariance  sequences  given  by 

Rx[k]  =  RxAkTs),  Ry[k]  =  (T^RxSakTs).  (4) 

Their  PSD’s  are  related  to  the  PSD’s  of  their  continuous¬ 
time  counterparts  by 


SiiQ)  =  7fr  ^  Si 


+  2kTr^ 

z  \  m 


I  =  x,y 


(5) 


k=—oo 


interesting  to  note  that,  in  the  general  case,  the  poles  in  the 
2-plane  move  on  logarithmic  spirals,  not  on  circles. 

The  rational  modeling  approach  just  introduced  also  sug¬ 
gests  an  efficient  way  to  artificially  synthetize  Doppler- 
shifted  processes  with  rational  spectra.  If  a  white  Gaussian 
sequence  is  used  the  input  of  a  digital  filter  with  transfert 
faction  (10),  it  is  straighforward  to  see  that  the  output  se¬ 
quence  of  the  filter  will  be  Gaussian  and  have  the  desired 
spectral  shape.  The  scaling  by  cr  is  trivial.  This  method  al¬ 
lows  easy  implementation  of  the  synthetizer,  even  for  vary¬ 
ing  shift  a,  via  the  parallel  form  of  the  filter  defined  by  (10). 


where  Q  €  [-tt,  tt]  is  the  normalized  radian  frequency. 

The  effect  of  the  Doppler  shift  can  be  viewed  as  a  change 
of  sampling  period  with  respect  to  the  original  signal  x{t) 
from  Ts  to  aTs-  It  is  clear  from  (5)  that  the  Nyquist  con¬ 
dition  (non-aliasing)  for  y[n]  is  that  Xc{t)  must  be  band- 
limited  to  <  Wsf2a.  If  this  condition  holds,  then  we 
have 


SyiQ)  =  —S.. 
"  a 


.(-)  =  -^5 

a  oTs 


(  — ) 


(6) 


3.  Rational  Modeling 


Let  us  assume  that  the  PSD  of  Xcit)  is  rational  (i.e.,  Xc{t) 
is  the  output  of  a  linear  filter  excited  by  white  Gaussian 
noise), 


P{3^) 


(7) 


4.  Estimators  of  Doppler  parameters 


Lety  =  (?/[0],2/[l],...,2/[iV-l])*bealengthiVsample 
of  the  process  y[n].  We  address  the  problem  of  estimating 
the  Doppler  parameters  a  and  a  from  y  when  the  statistics 
of  Xc(i)  are  known  (i.e.,  either  (r),  (w),  or  a  rational 

model  of  5x(G)  is  known). 

Clearly,  there  are  situations  in  which  it  will  not  be  pos¬ 
sible  to  identify  the  Doppler  parameters,  a  trivial  example 
being  the  white  noise  case.  An  identifiability  condition  for 
a  and  cr  from  ydt)  is  the  =  a'^Ra:da’T)  im¬ 

plies  implies  ex  —  ex!  and  er  =  ex' .  It  can  be  shown  that  if 
Xc{t)  is  of  finite  power  the  identifiability  condition  holds. 
For  the  discrete  process  y[k],  the  identifiability  condition 
becomes  Ry{a,cr)[k]  =  Ry{a' ,a'')[k],'ik  €  N,  or,  alter¬ 
nately,  Sy{n-,a,a)  =■  Sy{n-,a',a'),  implies  a  =  a'  and 
a  =  a' . 


where  P{s)  and  Q{s)  are  polynomials  in  s.  Let 


4.1.  Maximum-likelihood  estimator 


=  <*> 

be  the  partial  fraction  expansion  (s),  where  we  have  as¬ 
sumed  without  loss  of  generality  that  all  the  poles  of  Hx^  (s) 
are  simple.  Let  us  further  assume  that  Xc{t)  is  essentially 
band-limited‘  to  frequency  W  <  Then,  by  the  im¬ 

pulse  invariant  method  [6],  it  is  easy  to  show  that 

Sy{^)^'na^Ts\Ho,{e^^)f  (9) 


A  finite  length  vector  y  is  realization  of  a  Gaussian  zero- 
mean  random  variable  with  covariance  matrix  E  —  (^) 

S,(0)  =a^Sx.(Q)  (H) 

with  9  -  {a,  cr}.  Thus,  estimating  9  given  y  can  be  viewed 
as  a  structured  covariance  estimation  problem.  The  covari¬ 
ance  matrix  Ey(0)  is  linear  in  cr^,  but  will  generally  be  non¬ 
linear  in  a. 

The  maximum-likelihood  estimator  0ml  minimizes  [2] 


with 

k=l 

Equation  (10)  leads  to  an  heuristically  appealing  inter¬ 
pretation  of  the  Doppler  effect  for  rational  random  pro¬ 
cesses  in  terms  of  pole  displacements.  Consider,  for  ex¬ 
ample,  the  degenerate  case  where  the  poles  in  the  s-plane 
are  purely  imaginary,  Sk  =  j^k>  then,  in  the  2r-plane,  the 
Doppler  effect  rotate  the  poles  on  the  unit  circle  by  a  fac¬ 
tor  a.  This  interpretation  is  consistent  with  the  analysis  of 
the  Doppler  effect  for  deterministic  harmonic  signals.  It  is 

‘That  is,  all  but  a  negligible  fraction  of  its  energy  is  contained  in  the 
band  [— W]. 


L(0)=log|E,(0)|-ftr[E-i(0)S],  (12) 


where  S  =  yy*  is  the  sample  covariance  matrix.  In  general, 
the  minimization  of  L{6)  is  a,  non-linear  problem,  and  it  is 
necessary  to  resort  to  iterative  methods.  However,  for  large 
N,  the  computational  load  involved  by  the  matrix  opera¬ 
tions  in  the  computation  of  (12)  can  be  greatly  reduced  by 
taking  advantage  of  the  Toeplitz  structure  of  'Ey{9).  It  can 
be  shown  [3]  (see  also  [4])  that  the  minimization  of  L{6)  is 
asymptotically  equivalent  to  the  minimization  of 


7V-1 

E 


k~0 


^  /  27r /c  -X 
^OgSy{  —  \e)  + 


In{^) 


(13) 
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where  Sy(ft]6)  is  the  PSD  of  y[n]  considered  as  a  func> 
tion  of  Oy  In{^)  =  ^  y[k]e^^^  is  the  periodogram 

spectral  estimate  computed  from  y.  Once  /Ar(^)  has  been 
computed  (by  FFT  if  N  is  power  of  2),  the  evalution  of 
L{9)  by  (13)  requires  only  0{N)  operations.  The  mini¬ 
mization  of  (13)  can  be  viewed  as  the  minimization  of  the 
Itakura-Saito  distance  between  the  empirical  spectrum  and 
the  parametric  model  of  the  data.  The  ML  estimator  can 
thus  be  interpreted  as  a  “spectral  matching”  estimator  based 
on  the  Itakura-Saito  distance. 

If  an  approximate  value  or  a  range  for  the  Doppler  shift  a 
is  known,  it  is  possible  to  obtain  an  approximate  ML  estima¬ 
tor  with  a  lower  complexity.  Consider  the  Taylor  series  ex¬ 
pansion  of  E(^)  around  some  value  of  a  ao-  For  exam¬ 
ple,  fort;  <C  ewe  have  a  ljustifying  an  expansion  about 
ao  =  1.  If  we  restrict  the  Taylor  series  expansion  to  the  first 
two  terms,  it  is  trivial  algebra  to  show  that  Sy  (0)  can  be  ex¬ 
pressed  as  a  linear  combination  Sy  (0)  7 A  -f  (/)B  of  two 

A  X  Toeplitz  matrices  A  =  {a[i~j])  andB  =  (^[t-j]), 
with 


a[k] 

b[k] 


Ra;^{aokTs)  -  aob[k], 
dRx^{akTs)  I 


continuous  and  differentiable,  the  MoM  estimate  can  be 
shown  to  be  consistent  by  Theorem  3.14.  in  [7]. 

To  relax  this  strict  identifiability  condition,  an  alternate 
moment-based  estimate  can  be  obtained  by  equating  a  set 
of  K  moments,  K  <  N  -  1,  and  then  solving  in  the  least- 
square  sense: 


K~1 

^LSMoM  =  argrain  V  ^y[A:]  -  Ry{0){k] 
6  ^  II 

jfc=0 


2 


(16) 


Under  an  identifiability  condition  for  the  first  K  lags,  the 
LSMoM  estimator  can  be  shown  to  be  consistent  (by  The¬ 
orem  3.14  in  [7]  again).  From  ParsevaFs  theorem,  it  is 
not  difficult  to  see  that,  for  large  Ky  (16)  is  equivalent  to 
minimization  the  Li  distance  between  the  theoretical  PSD 
5y(n;  G)  and  a  windowed  periodogram  spectral  estimate.  If 
the  linear  approximation  of  the  covariance 

Ry{0)[k]  ^  ja[k]  -f  (j)b[k]  (17) 

is  used,  we  obtain  a  linear  least-square  problem.  Equiva¬ 
lently,  linearization  of  the  PSD  5y(fi;  9)  could  be  used  with 
a  L2  "spectral  matching”  criterion. 


4.3.  Rational  modeling  estimator 


and  the  reparameterization  9  =  {7,  (/>}  where  ^  and 
^  =:  (7'^a.  Even  with  this  linear  approximation,  the  min¬ 
imization  of  (12)  is  still  a  non-linear  problem  and  still  re¬ 
quires  an  iterative  solution,  but  a  simpler  one  than  the  orig¬ 
inal  (see  [1]). 

4.2.  Method  of  moments  estimator 

The  iterative  maximum-likelihood  methods  requires  an 
initial  estimate  of  the  parameters.  The  method  of  moments 
(MoM)  method  can  be  used  to  provide  such  estimate.  The 
MoM  estimate  ^moM  is  obtained  by  equating  sample  mo¬ 
ments  ^y[/c]  of  y  with  their  theoretical  values  expressed  as 
functions  of  9,  Using  the  first  two  covariance  lags  yields 

=  ]fk^y  <“« 

0  —  “  <^MoM^.Tc  (^MoM^s)'  (15) 

Equation  (15)  needs  to  be  solved  numerically,  unless  a 
Taylor  expension  similar  to  that  of  the  previous  section  is 
used  for  i?a;^(Q;rs).  Note  that  using  such  a  linear  expan¬ 
sion  is  equivalent  to  performing  only  the  first  iteration  of 
a  Newton-Raphson  algorithm  for  the  solution  of  (15).  The 
existence  of  a  solution  to  (15)  is  guaranteed  by  the  “maxi¬ 
mization  at  0”  property  of  covariance  functions  and  of  co- 
variances  sequences,  if  Rx^{r)  is  continuous  and  in  L2. 
Note  that  the  MoM  method  requires  a  stronger  identifia¬ 
bility  condition  than  the  one  introduced  earlier,  i.e.,  that 
Rx^{aTs)  =  Rx^ia^Ts)  implies  a  =  a'.  If,  in  addition, 
Rxei'^)  hence,  Ry[k]  viewed  as  a  function  of  a)  is 


Assume  that  a  rational  model  (AR  or  ARMA)  is  avail¬ 
able  for  5a; (D),  let  A;  =  1, ...  be  its  poles.  Further  as¬ 
sume  that  x[k]  arise  from  a  Xc{t)  that  is  essentially  bandlim- 
ited.  An  ad-hoc  estimator  based  on  the  rational  modeling 
approach  can  be  obtained  as  follows.  An  AR  and  ARMA 
model  can  be  fitted  to  y  with  classical  methods  (e.g.,  Yule- 
Walker).  For  ARMA  models,  only  the  AR  part  (poles)  is 
of  importance.  Letp^;,  k  =  1, ...  be  these  poles.  From 
(10),  it  follows  that  they  are  related  to  the  original  poles  by 

k  =  l,...,p.  (18) 

Minimization  of 

p 

II  -  « logpfc  11^  (19) 

with  respect  to  a,  which  is  a  trivial  linear  problem,  provides 
the  rational  modeling  estimator  of  the  Doppler  shift  Arm* 
and  d-RM  can  be  taken  equal  to  <7moM-  Note  that,  like  the 
ML  and  LSMoM  estimates,  the  RM  estimates  can  be  inter¬ 
preted  as  a  “spectral  matching”  estimator  with  the  particular 
spectral  distance  defined  by  (19). 

4.4.  Cramer-Rao  bound  X 

It  can  be  shown  that  the  Cramer-Rao  bound  (CRB)  for 
the  estimation  of  a  and  a  is  given  by 

CRB(q;)  =  2  [ill'll!, -AT-I  (20) 

2 

CRB((t)  =  ^  11*11^  CRB(a),  (21) 
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If  u  is  known,  Whittle’s  asymptotic  version  of  the  CRB 
for  a  takes  the  form 


CRB(a)  X  iV 


1  rT,  fdS^n)' 
-i_  /  I 
47r  ]-^\Sy{Q) 


dVt  (22) 


which  has  a  nice  intuitive  interpretation.  The  CRB  depends 
on  the  ’’sensitivity”  of  the  PSD  of  y[n]  to  variations  in  a, 
this  sensitivity  beeing  weighted  by  the  PSD  itself  (i.e.,  the 
energy  repartition  in  the  spectral  domain). 


5.  Preliminary  results 

In  order  to  evaluate  the  performance  of  the  proposed  esti¬ 
mators,  we  conducted  several  Monte-Carlo  simulations.  We 
used  a  4th  order  linear  model  for  the  continuous-time  pro¬ 
cess  x{t),  i.e., 


Figure  2.  Variance  of  the  estimators  of  the 
Doppier  shift  a. 


^Xci^)  10“  1754^^3  X 10- +7.02  X10“®S^-|-1. 2  Xl0“^5+1  ’ 

The  sampling  frequency  Fs  was  set  equal  to  40860  Hz.  The 
process  was  simulated  via  the  rational-modeling  approach 
described  in  Sec.  3. 

Monte-Carlo  simulations  have  been  performed  in  Mat- 
lab,  with  1000  independent  runs  for  each  estimator.  Figure  2 
summarizes  the  results  obtained  for  the  MoM  estimator,  the 
rational-modeling  estimator,  and  the  LSMoM  estimator,  of 
the  Doppler  shift  a.  For  the  rational-modeling  method,  the 
least-square  modified  Yule-Walker  algorithm  [7]  has  been 
used  to  compute  the  poles  of  the  rational  model.  Five  co- 
variances  lags  were  used  in  the  LSMoM  method  with  the 
linearization  of  the  covariance  (17)  about  ao  =  1.  The 
ML  estimator  suffered  from  numerical  convergence  prob¬ 
lems  and  its  results  are  not  included  here. 

Of  the  three  estimators,  the  linearized  LSMoM  estimator 
has  the  lowest  computational  cost,  followed  by  the  rational- 
modeling  estimator,  and  the  MoM  estimator  (because  of  the 
numerical  resolution  of  (15)).  For  samples  of  moderate  and 
large  size  iV,  the  rational  modeling  estimator  of  the  Dopple 
shift  alpha  outperforms  the  two  moment-based  estimators 
which  have  very  close  variances.  For  the  estimation  of 
the  gain  a,  all  three  methods  offered  very  similar  perfor¬ 
mances.  More  complete  results,  including  performance  on 
real  Doppler-shifted  acoustic  data  will  be  presented  at  the 
conference. 

6.  Concluding  remarks 

Various  estimators  of  the  Doppler  parameters  for 
Doppler-shifted  Gaussian  random  processes  have  been  pro¬ 
posed.  All  these  estimator  rely  on  a  stationarity  assumption 
of  the  Gaussian  process.  In  the  context  of  Doppler-shift 
for  moving  sources,  this  amounts  to  assuming  that  the  wave 
source  is  moving  at  a  constant  radial  toward  (or  away  from) 
the  fixed  receiver.  If  the  source  is  not  moving  at  constant 
radial  speed,  the  random  process  y{t)  is  no  longer  station¬ 
ary.  The  approach  proposed  here  can  be  extended  to  treat 


this  case.  For  example,  an  adaptive  ARMA  modeling  tech¬ 
nique  could  be  easily  combined  with  the  rational  model¬ 
ing  approach  to  track  the  evolution  of  the  Doppler  param¬ 
eters  in  time.  Likewise,  the  ML  and  moment-based  meth¬ 
ods  (or  their  “spectral  matching”  equivalent)  could  be  ap¬ 
plied  to  a  sliding  window  of  adequate  length  so  as  to  insure 
quasi-stationary  of  the  signal.  The  resulting  estimates  of  the 
Doppler  shift  a{t)  and  gain  a(t)  could  be  further  smoothed 
by  using  a  priori  knowledge  on  the  movement  of  the  source, 
if  is  such  knowledge  is  available. 

Another  approach  would  be  to  use  a  non-stationary  rep¬ 
resentation  of  the  signal.  An  appealing  candidate  for  this 
non-stationary  representation  is  the  wavelet  transform  (as 
suggested  in  [8]):  moving  noise  sources  result  in  dilatations 
of  the  time  axis  and  propagation  delays,  which  are  exactly 
the  operations  (translation  and  dilatation)  used  in  the  defi¬ 
nition  of  the  wavelet  transform. 
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Abstract 

This  paper  is  concerned  with  the  estimation  of  mul¬ 
tiple  constant  velocity  target  trajectories  in  a  low  SNR 
environment.  Each  target  trajectory  is  characterised  by 
the  initial  position  and  velocity^  which  are  to  be  esti¬ 
mated.  A  major  difficulty  w  that  the  target  amplitudes 
are  unknown  and  will  in  general  be  time  varying.  The 
approach  in  this  paper  is  to  model  the  target  amplitude 
as  an  autoregressive  (AR)  process.  A  maximum  likeli¬ 
hood  estimator  is  derived  for  the  parameters  of  the  AR 
process  and  the  unknown  target  position  and  velocity 
using  the  expectation  conditional  maximisation  (ECM) 
algorithm. 


1.  Introduction 

This  paper  is  concerned  with  the  estimation  of  the 
trajectories  of  multiple  constant  velocity  targets  ob¬ 
served  using  an  optical  sensor,  recording  2-D  images. 
In  a  low  SNR  environment,  target  locations  cannot  be 
estimated  using  a  single  image,  so  a  number  of  frames 
must  be  recorded  and  processed.  At  the  end  of  this 
observation  interval  estimates  of  all  target  trajectories 
are  obtained. 

Previously  the  single  target  problem  has  been  for¬ 
mulated  in  [1,  2,  3,  4]  as  either  a  maximum  likelihood 
estimation  problem,  or  frequency  domain  matched  fil¬ 
ter  problem.  The  amplitude  is  either  assumed  constant 
[1,  2,  3],  or  completely  unknown  and  therefore  uncorre- 

*This  work  was  supported  by  the  Co-operative  Research  Cen¬ 
tre  for  Sensor  Signal  and  Information  Processing  (CSSIP).  A. 
Logothetis  is  supported  by  the  Australian  Telecommunications 
and  Electronics  Research  Board  (ATERB) 


lated  from  pixel  to  pixel  [4].  In  addition,  a  discrete  set 
of  candidate  target  velocities  is  tested,  resulting  in  per¬ 
formance  loss  in  the  presence  of  a  mismatch  between 
assumed  and  actual  target  velocity. 

In  general  the  target  amplitude  is  time  varying.  Re¬ 
cently  in  [5]  a  first  order  model  is  proposed  for  the  time 
varying  amplitude,  but  the  mean,  variance  and  corre¬ 
lation  from  one  time  to  the  next  are  assumed  known 
a-priori.  A  numerical  optimisation  procedure  is  used 
to  obtain  continuous  estimates  of  target  position  and 
velocity,  overcoming  the  problem  of  mismatch. 

This  paper  extends  the  formulation  in  [5]  to  account 
for  multiple  targets.  The  time  varying  amplitudes  of 
the  targets  are  modeled  as  independent  first  order  au¬ 
toregressive  (AR)  process,  similar  to  [5],  but  the  AR 
process  parameters  are  assumed  unknown  a-priori.  A 
maximum  likelihood  estimator,  using  the  expectation 
conditional  maximisation  (ECM)  algorithm,  is  derived 
to  simultaneously  estimate  the  parameters  of  the  AR 
processes  and  the  unknown  target  positions  and  veloc¬ 
ities. 

2.  Problem  Formulation 
2.1.  Signal  Model 

There  are  N  constant  velocity  targets,  with  time 
varying  amplitude.  For  simplicity,  N  is  assumed  known 
a-priori^  but  in  practice  would  also  need  to  be  esti¬ 
mated.  The  time  varying  amplitude  for  the  target, 
denoted  is  modeled  as  a  first  order  AR  series,  given 
by 

(1) 
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where  a^^\k)  =  a(”),VAr  is  the  constant  mean  am¬ 
plitude  of  the  source  (target).  The  v^^\n  G 
N}  are  statistically  independent  zero-mean 
white  Gaussian  sequences  with  variances  o*^(n))  so  the 
target  amplitude  series  are  independent.  The  inter¬ 
frame  amplitude  correlation  is  given  by  , 

The  measurements  are  images  recorded  using  an 
optical  sensor  array  consisting  of  C  x  D  resolution 
cells  (pixels)  of  dimension  Ax  x  Ay.  The  measure¬ 
ment  at  time  k  is  Y{k)  e  ,  the  set  of  observa¬ 

tions  recorded  in  all  cells  at  time  k.  The  signal  from 
a  point  source  is  spread  according  to  a  point  spread 
function,  approximated  here  by  a  2-D  Gaussian,  If 
represents  the  initial  position  (in 
X  and  y)  and  initial  velocity  (in  x  and  y)  for  the 
source,  then  the  signal  in  cell  i,  j  at  time  k  due  to  source 
n  is  h\j\k)^^^\k),  where 


hi"\k)  =  - - exp 

27r(rj;(Ty 


+  kTvl^^  -  iAxy 

2crl 


-  jAy)^ 


and  cTj;  and  (Ty  are  characteristic  for  the  sensor  and 
assumed  known,  and  T  is  the  sampling  interval. 

The  measurement  in  cell j  at  time  k  is  given  by  the 
weighted  sum  of  the  time  varying  amplitudes  embedded 
in  white  noise,  given  by 

=  (3) 

n=l 

where  eij{k)  are  statistically  independent  zero-mean 
white  Gaussian  processes  with  known  variance  aj.  Eqs 
(1)  and  (3)  can  be  written  in  a  state  space  form  as 
follows 

x{k)  =  Ax{k  -  1) Bv{k) 

z[k)  =  G{k)x{k)  -h  w{k)  (4) 


5(n) 

=  0 

\») 

v{k) 

=  vec  (v^^\k) , . . . 

z{k) 

=  vec(y(Ar)) 

Y{k) 

=  bij(^)] 

G{k) 

=  (vec(iy^^)(Ar)), , 

...,vec(^rW(jfc)))  B' 

H^^'>{k) 

=  [hf;\k)] 

w{k) 

=  yec{E{k)) 

m 

=  [eij(fc)] 

(5) 

and  x{k)  is  of  dimension  3N  x  1,  A  is  SN  x  37V,  B  is 
37Vx7V,  v{k)  is  TVx  1,  z{k)  is  CDx  1,  G{k)  is  CDxSN 
and  w{k)  is  CD  x  1. 

The  total  number  of  images  (observations)  recorded 
is  Kj  and  the  observation  sequence  (2:(1), , . . ,  z{K))  is 
denoted  Zk-  The  state  sequence  {x{l), . . .  yX{K))  is 
denoted  Xk> 

2.2.  Estimation  Objective 

Let  ^0  =  (^1)  ^2,  •  •  •  5  ^5)  ^  0  denote  the  true  model, 
such  that  01  =  {(^1(1)  5  •  •  •  J  <r^(jv))  J  ^2  =  *  •  •  > 

and  ^3  to  0  s  represent  (iCo^\ 

The  value  of  S  depends  on 
how  the  parameter  space  0  C  is  partitioned,  and 
5G  {3,.,.,47V-f-2}, 

Given  a  realization  {z{l) , . . .  ^  z{K)} ,  of  the  stochas¬ 
tic  model  of  (4),  the  objective  is  to  obtain  the  maxi¬ 
mum  likelihood  (ML)  estimate 

0^^  =  argmaxp(Zii-l^)  (6) 

0 

where  p{Zk\0)  denotes  the  marginal  density  of  the  ob¬ 
servations  Zk  conditioned  on  the  model  0. 


where 

a;(Ar)  =  vec  (x^^\k) , . . . ,  x^^^k)^ 

(  \ 

a;(")(jk)  =  ^(")(fc-l) 

V  / 

A  =  diag(A^^\  . . 

/  (/(")  0  1  -  0  \ 
=10  0  0 

\  0  0  1  0  / 

B  =  diag(5(i),...,5(^)) 


3.  Proposed  Algorithm 

The  ECM  algorithm  [6]  is  used  to  obtain  6^^ ,  meet- 
ing  the  objective  in  Section  2,2,  As  a  by-product  of  the 
ECM,  MAP  estimates  of  the  states  x{k)  are  also  ob¬ 
tained. 

The  ECM  algorithm  is  an  extension  of  the  expec¬ 
tation  maximisation  (EM)  algorithm  [7]  and  is  an  it¬ 
erative  method  of  extracting  the  mode  of  the  likeli¬ 
hood  function.  From  an  initial  estimate  G  0 
a  sequence  of  estimates  {0^^^}  are  generated.  Let 
F  =  {fs{0)  :  s  =  1, , . . ,  5}  be  the  set  of  S  constraint 
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functions  of  6,  with  f,{0)  =  {Oi, . . .  ,es-i,0,+i, . . .  ,63)- 
At  the  (/+1)®‘  iteration  (“pass”)  of  the  ECM  algorithm 
we  perform  the  following  steps: 

E-step:  Just  like  the  EM,  we  evaluate  the  conditional 
expectation  of  the  log  likelihood  of  the  complete  data: 

Q{e,e^'^)  =  E{\np{ZK,XK\e)\ZK,e^^^}.  (7) 

Here  p  is  the  density  function  of  the  complete  data 
Mk  =  {Zk^  Xk)  and  6^^^  is  the  parameter  estimate  at 
the  iteration. 

CM-steps:  For  s  —  find  that  max¬ 

imises  Q{6^6^^^)  (as  a  function  of  6)  subject  to  the 
constraint  fs{6)  =  ie 

for  all  ^  G  0  for  which  fs(P)  —  ^  ' 


3.2.  CM-Steps 

The  constrained  maximisation  of  Q{9,0^^^)  consists 
of  the  following  S  steps: 

CM-Step  1:  Calculate  using  (8).  This 

step  determines  n  =  A  closed 

form  solution  is  obtained  for  each  of  the 
by  differentiating  ,  9^^^ )  with  respect  to 

o'lin)  and  setting  the  derivative  equal  to  zero. 

CM-Step  2:  Calculate  ^f^+2/5}  (8).  This 

step  determines  ,  n  =  1, . . . ,  A.  A  closed 

form  solution  for  each  is  obtained  in  the 

same  way  as  for  CM-Step  1. 


where 


^  (^{1+1}  _  _  _  _  ^  ^  0{O  , . . . ,  4'} ) 


The  appealing  property  of  the  ECM  algorithm 
is  that  likelihoods  increase  monotonically,  i.e., 
>  p{Zk\9^^^)  with  equality  holding  at  the 
ML  estimate,  provided  the  set  F  of  constraints  spans 
the  parameter  space  [6].  The  rate  of  convergence  of 
this  algorithm  is  studied  in  [8] . 


3.1.  E-Step 

The  evaluation  of  Q{0, 9^^^ )  requires  the  density 
function  of  the  complete  data,  p{Zk ^XkIO)^  From  the 
model  in  (4) 

K 

p{ZK,XK\e)  = 

/c  =  l 

K 

k  =  l 

K  C  D 

fc=ii=i j=i 
K  N 

k  =  l  nzz\ 

(9) 

where  the  conditional  densities  of  and  yij[k) 

are  Gaussian  and  obtained  from  (1)  and  (3),  and  the 
definition  of  Taking  the  conditional  expec¬ 

tation  of  the  log  of  (9)  gives  Q{6,9^^^).  This  re¬ 
quires  the  computation  of  and 

(where  (‘)^  ^  =  E{'\Zk^9^^^})  which  are  obtained  us¬ 
ing  a  Kalman  smoother  [9].  Due  to  space  constraints, 
Q{9, 9^^^)  is  not  shown  in  full  in  this  summary. 


CM-Steps  3  to  S:  Calculate  s  = 

3, . . . ,  5  using  (8).  In  this  step  the  target  position 
and  velocity  parameters  are  updated.  In  general 
there  will  be  no  closed  form  solution,  so  the 
CM-step  will  involve  some  form  of  iterative  search 
in  a  parameter  space  0'  C  0.  The  dimensional¬ 
ity  of  0'  is  determined  by  the  dimensionality  of  Os , 
and  is  less  than  that  of  0.  In  this  paper  the  param¬ 
eter  space  is  partitioned  such  that  5  =  iV  +  2  and 
each  ^5 ,  s  =  3, . . . ,  S'  corresponds  to  the  initial  po¬ 
sition  and  velocity  parameters  for  one  target.  This 
definition  means  each  parameter  update  requires  a 
4-dimensional  search,  which  is  implemented  using 
a  gradient  descent  technique.  This  results  in  more 
complex  implementation,  but  faster  convergence, 
than  if  each  Og  corresponded  to  a  single  position 
or  velocity  parameter. 

4.  Simulation 

The  procedure  has  been  implemented  for  the  fol¬ 
lowing  scenario:  The  measurement  sensor  consists  of 
<7  X  D  =  20  X  20  resolution  cells  of  dimension  Aar  = 
Ay  =  1  cell  and  (Tx  =  ay  =  0.6,  with  a  measurement 
noise  variance  of  a^  =  1.0.  The  total  number  of  frames 
processed  is  K  —  100.  There  are  TV  =  3  targets  mov¬ 
ing  through  the  field  of  view,  with  amplitude  statistics, 
inital  positions  and  constant  velocities  as  given  in  Ta¬ 
ble  1.  For  these  trajectories  there  will  be  a  number  of 
frames  for  which  there  is  signal  from  more  than  one 
target  in  some  cells.  The  approximate  average  SNRs 
for  the  cells  in  which  the  targets  are  located  are  1.5  dB, 
0.6  dB,  and  1.8  dB  respectively. 

The  ECM  algorithm  was  initialised  randomly 
around  the  true  position  and  velocity  parameters,  and 
with  =  0.0,  =  0.25  and  =  0.0.  The  esti¬ 

mated  amplitude  statistics  and  target  initial  positions 
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n 

_2 

Xq 

— ^ 
Vo 

W  ■■ 
vh  ^ 

1 

2.5 

5.00 

0.7 

1.25 

1.35 

0.17 

0.16 

2 

3.0 

0.00 

1.0 

3.05 

12.55 

0.14 

-0.05 

3 

2.0 

s.ooH 

0.0 

10.00 

16.50 

0.0 

-0.15 

Table  1.  True  target  parameters 


and  velocities  obtained  from  a  single  run  are  given  in 
Table  2.  The  recovered  trajectories,  along  with  the 
true  trajectories  are  shown  in  Figure  1. 


n 

a(”) 

1 

2.78 

5.34 

0.76 

1.17 

1.30 

0.17 

0.16 

2 

2.98 

0.30  1 

0.15 

3.21  ^ 

12.35 

0.14 

-0.05 

3 

1.67 

9.45 

0.05 

9.96 

16.51 

0.00 

-0.15 

Table  2.  Estimated  target  parameters 

The  target  amplitude  parameters  and  the  position 
and  velocity  parameters  are  recovered  accurately,  even 
for  low  SNR  targets  with  crossing  trajectories.  The 
correlation  (d)  for  target  2  is  difficult  to  recover  since 
a  constant  amplitude  target  can  be  represented  by  any 
d,  with  =  0. 

The  rate  of  convergence  of  the  ECM  algorithm  de- 
teriorates  as  the  initialisation  of  position  and  velocity 
parameters  moves  further  from  the  true  values.  Some 
form  of  grid  search  is  required  to  provide  adequate  ini¬ 
tialisation.  The  rate  of  convergence  also  deteriorates  as 
the  SNR  decreases.  For  this  example  about  30  passes 
of  the  ECM  algorithm  were  required. 

5.  Conclusion 

This  paper  presents  a  technique  to  estimate  the  tra¬ 
jectories  of  multiple  constant  velocity  targets  with  time 
varying  amplitudes,  observed  with  an  optical  sensor. 
The  ECM  algorithm  is  used  to  obtain  the  ML  estimate 
of  the  target  trajectories  and  MAP  estimates  of  target 
amplitudes.  The  technique  has  been  applied  success¬ 
fully  to  crossing  targets  in  low  SNR  conditions. 
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ABSTRACT 

Estimation  of  parametric  input-output  (10)  infinite  im¬ 
pulse  response  (HR)  transfer  function  is  considered.  Some 
of  the  desirable  properties  of  any  approach  to  this  prob¬ 
lem  are:  unimodality  of  the  performance  surface,  consis¬ 
tent  identification  in  the  sufficient- order  case,  and  stability 
of  the  fitted  model  under  undermodeling.  Some  of  the  well- 
known  approaches  fail  to  satisfy  one  or  more  of  these  prop¬ 
erties.  The  time- domain  equation  error  method  (EEM) 
yields  a  unimodal  performance  surface,  biased  estimates 
in  colored  noise  and  sufficient- order  case,  and  stable  fit¬ 
ted  models  under  undermodeling  if  the  input  »s  autoregres¬ 
sive.  In  this  paper  we  propose  a  frequency- domain  solution 
to  the  least-squares  equation  error  identification  problem 
using  the  power  spectrum  and  the  cross-spectrum  of  the 
10  data  to  estimate  the  10  parametric  transfer  function. 
The  proposed  approach  is  shown  to  yield  a  unimodal  per¬ 
formance  surface,  consistent  identification  in  colored  noise 
and  sufficient- order  case,  and  stable  fitted  models  under 
undermodeling  for  arbitrary  stationary  inputs  so  long  as 
they  are  persistently  exciting  of  sufficiently  high  order. 

1  Introduction 

Consider  the  following  widely  used  input-output  linear 
system  model: 


y{t)  =  H{q  +  v(t)  (1-1) 

where  is  the  measured  input  sequence,  t  is  discrete¬ 

time,  {y(t)}  is  the  noisy  output,  and  {v(t)}  is  a  measure¬ 
ment  noise  (disturbance)  sequence.  With  q~^  denoting 
the  backward-shift  operator  (i.e.  =  u{i  —  1)),  the 

linear  system  H{q^^)  represents  an  IIR  (infinite  impulse 
response)  system: 


(1-2) 

i=0 


Given  an  input-output  record  {^(t),  y(t),  t  = 
but  the  underlying  true  system  model  ]fl{q~^)  unknown, 
it  is  of  much  interest  in  signal  processing,  communications 
and  control  applications  to  fit  a  rational  function  model 


G{q-^) 


^(9-^)  1  +  Er=‘i 


(1-3) 


This  work  was  supported  by  the  National  Science  Founda¬ 
tion  under  Grant  ECS-9504878. 


to  given  input-output  record  [l]-[6],[8].  A  wide  variety  of 
approaches  exist  [1],[4],[5],[8]. 

In  any  model  fitting  and  parameter  estimation  prob¬ 
lem,  key  issues  influencing  the  choice  of  the  approach  are 
[1].[4],[5].[8]: 

(i)  Global  Convergence:  Unimodality  of  the  cost  func¬ 
tion.  Does  there  exist  a  unique  global  asymptotic 
convergence  point?  For  instance,  the  prediction  error 
method  (PEM)  and  the  output  error  method  (OEM) 
[4], [5]  do  not  have  a  unimodal  cost  function,  in  gen¬ 
eral,  whereas  the  equation  error  method  (EEM)  [1], 
the  St  eight  z- McBride  method  (SMM),  and  the  instru¬ 
mental  variable  method  (IVM)  [4], [5], [8]  all  have  a 
unique  global  asymptotic  convergence  point. 

(ii)  Consistency:  If  the  model  set  (i.e.  the  set  from  which 
the  fitted  model  is  selected)  contains  the  true  system 
(the  so-caUed  sufficient  order  case),  does  the  fitted 
model  asymptoticaUy  converge  to  the  true  model?  Ig¬ 
noring  the  lack  of  unimodahty,  PEM  is  consistent  un¬ 
der  a  broad  class  of  conditions  [4], [5]  and  so  is  IVM, 
but  SMM  and  OEM  are  so  only  for  white  measure¬ 
ment  noise  and  EEM  (as  modified  in  [1])  has  similar 
hmitations. 

(hi)  Statistical  Efficiency:  What  is  the  variance  (and 
bias)  of  the  fitted  parameters?  K  it  converges  to  the 
correct  solution,  PEM  is  known  to  yield  the  smallest 
variance  [4], [5]. 

(iv)  Reduced- Order  Modeling  (Undermodeling):  When 
the  true  system  does  not  belong  to  the  model  set  (for 
instance,  suppose  that  is  not  a  rational  func¬ 

tion),  it  is  meaningless  to  talk  of  consistency.  From 
a  practical  viewpoint,  a  key  issue  now  is  if  the  fit¬ 
ted  model  is  stable  like  the  underlying  true  model? 
It  turns  out  that  only  EEM  leads  to  a  reduced-order 
stable  model  provided  that  the  input  {^(t)}  is  an  AR 
process.  Indeed,  it  is  noted  in  [l]  that  “...if  the  input 
can  not  be  ascertained  autoregressive,  equation  error 
methods  should  perhaps  not  be  used.” 

The  main  objective  of  this  paper  is  to  provide  a 
frequency- domain  solution  to  the  problem  of  equation 
error  (least-squares)  system  identification  using  spectral 
analysis.  The  proposed  method  is  shown  to  lead  to  a  uni¬ 
modal  performance  surface,  consistent  identification  in  col- 
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ored  noise  and  sufficient-order  case,  and  stable  iitted  mod¬ 
els  under  undermodeling  for  arbitrary  stationary  inputs  so 
long  as  they  are  persistently  exciting  of  sufficiently  high 
order. 

2  Model  Assumptions 

We  impose  the  following  conditions  on  (1-1): 

(ASl)  {«(<)}  and  {y(t)}  are  zero-mean  and  jointly  sta¬ 
tionary.  The  power  spectral  density  5utt(e^")  of 
{u(t)}  is  >  0  for  almost  all  a;  E  [0,x]. 

(AS2)  The  true  system  transfer  function  is 

causal  and  stable.  Therefore,  ^ 

(ASS)  The  noise  sequence  {v(t)}  is  zero-mean,  station¬ 
ary  and  independent  of  {u(f)}. 

(AS4)  The  following  summability  conditions  hold  true; 

oo 

[l-flrj|]  •  *  '  »Tfc_i)|  <  00, 

ni**i'rfc_i=-oo 

for  each  j  =  2,-  -,A;  -  1  and  each  k  = 
2, 3,  •••  where  Zi(t)  E  {y(^)»  «(0*  KO) 

cumulant  of  the  random  variables  '{^i(t  -\~ 
n ),  •  •  • ,  Zk-i{t  -h  Tk-i ),  Zk{t)}- 
Let  the  vector  of  unknown  parameter  be  given  by 

0  =  [ai  •  •  •  fln*  ^0  *  ■  *  .  (2  *“  1) 

3  A  Frequency-Domain  Solution 

Consider  the  cross-spectral  density 

=  E  (3-1) 

fc=-oo 

It  then  follows  easily  that 

(3-2) 

The  basic  approach  to  model  parameter  estimation  con¬ 
sists  of  two  steps.  First  obtain  a  consistent  estimator 
^(e^*^)  of  fl^(e^")  via  consistent  estimators  5ytt(w)  and 
5uu(w)  of  5s,u(w)  and  5uu(w),  respectively,  based  upon  the 
input-output  record  {u(t),jf(t),  i  =  1,2,  Next  es¬ 

timate  the  system  parameters  using  the  estimated  transfer 
function  matrix  as  “data.” 

3.1  Transfer  Function  Estimator  and  Its 
Statistics 

This  involves  little  more  than  estimation  of  cross- 
spectrum  between  {3^(t)}  3-^1  {^(^)}»  power  spec¬ 

trum  of  {«(t)}.  Numerous  techniques  are  available  for  this 
purpose;  see  [7]  and  references  therein.  We  vrill  follow  the 
approach  of  smoothing  in  frequency  domain  [7,  Sec.  7.4]. 
Given  a  record  of  length  T,  let  F(u;)  denote  the  DFT  of 
{y(t),  1  <  ^  <  ^}  given  by 

T-l 

=  E  +  l)exp(-3(<;).t)  (3-3) 

t=0 


where 

u,k  =  Y’‘>  fe  =  0.1,---,T-l.  (3-4) 

Similarly  define  U{u/k)- 

Given  the  above  DFT’s,  following  [7,  Sec.  7.4]  we  define 
the  cross-  and  auto-spectrum  estimators  as 

Svu{k)  =  E  (3  -  5) 

J  =  1 

and 

5«u(fe)  =  ^  E  (3  -  6) 

for  1  <  Jk  <  T  -  1,  where  the  scalar  weighting  function 
W^^^(a)  is  given  by 

oo 

W'(’’)(a)  =  Y,  +  (3-7) 

is  — OO 

such  that  W{0),  -oo  <  0  <  oo,  is  real-valued,  even, 
of  bounded  variation  satisfying  /^^(/?)<^/?  =  1 
XT  ^  ^  conve¬ 

nient  to  take  W{P)  =  0  for  \fi\  >  2t  and  VF(/9)  =  (^ir)""^ 
for  \I3\  <  2t.  In  this  case  (3-5)  involves  uniform  weighting 
of  the  2jBtT  H-  1  cross-periodogram  ordinates  whose  fre¬ 
quencies  fall  in  the  interval  (wfc  —  2irJ3T»  w*  -h2x.BT)*  Thus 
(3-5)  reduces  to 

where  mr  =  BtT,  Similar  modification  holds  for  (3-6). 
Let  us  choose  Bt  to  be  such  that  as  T  — ►  oo,  we  have 
Bt  0  and  BtT  — ►  oo.  Let  fci(T)  vrith  T  =  1, 2,  •  •  •  be 
a  sequence  of  integers  such  that  limT-^oo  fci(T)/T  =  fi,  a 
fixed  frequency  (in  Hz). 

In  light  of  (3-8)  define  a  coarser  frequency  grid: 

2tI  27cl(2mT  +  1)  2tI{2BtT  -j- 1)  ,, 

=  = - T - = - T 

with  /  =  0,1,-  -.iT  -  1  where  Lt  =  Using 

the  estimated  spectra  we  have  an  estimator  of  the  system 
transfer  function  at  frequency  ufk  (as  in  [7,  Chapter  8]) 

B(e’“‘)  =  S-J(h)Sv»(fc)  (3-10) 

provided  that  5“J(*:)  exists.  If  S~^{uh)  exists,  then  it 
follows  from  [7,  Thm.  8.11.1]  that 

limT-.=o.ff(e’*"0  =  limT-.oo5-J(fc(r))  V(fc(T)) 

=  B(e^^’'7)  w.p.l  (3-11) 

where  limr-foo  h(T)/T  =  /.  Convergence  in  (3-11)  is  uni¬ 
form  in  /.  Finally,  by  the  asymptotic  independence  of  the 
periodogram  and  cross-periodogram  on  the  grid  (3-4)  for 
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0  ^  ^  ^  ^/2  (see  also  [7,  Chap.  7]),  it  follows  that 
for  k  =  l{2mT  4-1),  /  =  0, 1,  •  •  • ,  (Irgn/2)  —  1,  are  (asymp¬ 
totically)  independent.  It  follows  from  [7,  Thm.  8.8.1]  that 
F(e^"*»)  for  k  =  l{2mT  4-  1),  /  =  0, 1,  •  ■  • ,  (Xt/2)  -  1, 
are  (asymptotically)  jointly  complex  (circularly  symmet¬ 
ric)  Gaussian  such  that 


limT-.oo  At  cov 


=  L _ |5yu(tjfe)|^  1 

S-u‘u(t*^/e)  ^yy(j^k)Suv.{}*^k)\ 

UmT^co  At  cov 
where 


(3  -  12) 
=  0,  (3  -  13) 


(if  (3-8)  is  used). 

(3  -  14) 

andcov{X,r}  =  E{XY*}-E{X}E{Y*}.  Thus,  ^(e^’"'**) 
on  the  coarse  grid  (3-9)  is  asymptotically  a  complex  Gaus¬ 
sian  (in  the  sense  of  [7,  Sec.  4.2])  random  variable,  indepen¬ 
dent  at  distinct  frequencies  on  the  coarse  grid  over  (0,  tt), 
with  the  covariance  structure  (3-12). 

3.2  An  Equation  Error  Formulation 

It  follows  from  the  definition  of  (cf.  (1-3)  )  that 

’T’tt  Tl^ 

i=l  i=0 

.(3-15) 

for  any  ljh.  We  rewrite  (3-15)  after  replacing  G(e^"^)  with 
the  true  transfer  function  estimate  (see  (3-10)), 

as 


rift  rtj, 

t=l  1=0 

(3  -  16) 

Using  frequencies  (Jk  =  27r{k  ~  1)/Lt  for  0  <  k  <  L  = 
{Lt/2)  —  1,  (3-16)  may  be  rewritten  in  a  matrix-equation 
form  and  a  least-squares  solution  can  be  found,  as  in  [3] 
(but  in  a  different  context).  One  may  also  wish  to  split  (3- 
16)  into  its  real  and  imaginary  parts  and  then  solve  it  in 
order  to  preserve  the  real- valued  nature  of  the  parameters 
(see  [3]). 

The  above  least-squares  formulation  is  equivalent  to  the 
following  formulation.  Choose  0  to  minimize  the  cost 


where 


1=0 

rig, 

^(e^'“>:e)  =  1  +  ^ai(0)e-^'" 


(3  -  17) 
(3  -  18) 

(3  -  19) 


In  order  to  deduce  some  desirable  theoretical  properties  it 
will  be  convenient  to  work  with  (3-17)  in  the  rest  of  the 
paper. 

Lemma  1.  Under  (ASl)-(AS4),  limT-^oo  Jit(^) 
Jioo{^}  uniformly  in  0  for  9  €  0C7,  any  compact  set,  where 

‘fi-(e)  =  ^  /  |>l(c^“;  -  ■B(e^“:e)f  du>  . 

J  —  ‘K 

(3  -  20) 

Proof:  By  [7,  Thm.  8.11.1],  convergence  in  (3-11)  is  uni¬ 
form  in  /.  In  particular,  for  ufi  on  the  grid  (3-9),  given  any 
e  >  0  there  exists  an  integer  IV’(c)  such  that 


<  e 


ti/.p.l 


(3-21) 


uniformly  in  wi  for  T  >  N{e).  Moreover,  by  [13,  Prop. 
1.2.16]  and  (3-21),  we  also  have 


||F(e^'“‘)|"  -  |ff(e^“‘)l'  <e  tn.p.l  (3-22) 
uniformly  in  ui  for  T  >  N{e).  Consider 

Di(e)  =  |^(e^“‘;0)F(e^"')  -  B(e^"‘;fl)|=' 
-|.4(e^'“‘:0)J(e^“')  -  P(e^“‘;5)|^ 

=  |>l(e^'"‘;5)|"  [|t(e^“‘)l"-|^(e"“')l"] 

-|-vl(e^"‘:(9)B*(e""'j0)  Jfi'(e^“‘)  -  .ff(e^“')j 

+  .  (9)  )  _  .ff (gi".  )J  *  . 

(3-23) 

By  compactness  of  ©c  and  continuity  of  A{e~^'^‘-,6)  and 
B(e-^"‘;0)  in  9  as  well  as  in  wj,  we  have 


sup 

iiJi  e  [-TT,  x] 


||^(«^'“‘;«)i<^<°°  (3-24) 


and 


sup 

G  [-X,  x] 


sup 

9  G  ©c 


||5(e'“‘:«)||  <M<oo. 


(3  -  25) 

Therefore,  by  (3-21)-(3-25),  given  any  ei  (=  3Af^e)  >  0, 
there  exists  an  integer  N{fi )  such  that 


|A(e)|  <  ei  w.p.l  Vfl  6  0c.  Vwi,  Vr  >  JV(£i). 

(3  -  26) 

Now  define 


3 - -L 

£=0 

(3  -  27) 

Using  (3-17),  (3-26)  and  (3-27)  it  follows  that 


|jlT(«)-JlT(e)|  <  ^  I^‘(«)I<^1 


W.p.l 

(3  -  28) 
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€  ©c  and  VT  >  N{€i).  Finally,  for  large  Lt  as  the  fre¬ 
quency  spacing  becomes  finer  and  finer,  using  the  integral 
approximation  to  the  summation  in  (3-27)  it  follows  that 

Jit(0)  =  -  r du> 
Jo 

=  —  r  U(e^“;  dw  =  Jioo(e). 

2^^  J-n  ,  , 

(3  -  29) 

The  desired  result  then  follows  from  (3-28)  and  (3-29).  □ 

4  Convergence  Analysis 

Define 

==  arg{min0JiT(^)}  (4-1) 

^  =  arg{mineJioo(^)}.  (4-2) 

Using  Lemma  1  and  some  standard  arguments  we  can  es¬ 
tablish  Theorem  1. 

Theorem  1.  Under  (AS1)-(AS4), 

''I" 

where 

V^^'>  :=  I  JicoW  =  }  .  (4-3) 

Proof:  Mimic  the  proof  of  Theorem  1  in  [11]  using  Lemma 
1.  Note  that  the  convergence  to  the  set  is  to  be  in¬ 
terpreted  in  the  sense  of  Ljung  [5,  p.  215],  □ 

The  properties  of  have  been  studied  in  [9].  First  we 
need  some  definitions. 

Def.  Sufficient  Order  Model  Set:  The  true  model 
H{q^^)  is  of  the  type  (1-3)  such  that  the  true  model  orders 
Uao  and  Ubo  satisfy  min(na  ^  riao,  —  nto)  >0.  • 

Def.  Reduced  Order  Model  Set  (Undermodeling): 
Either  the  true  model  H{q~^)  is  not  of  the  type  (1-3),  or  it 
is  but  the  true  model  orders  Uao  and  nbo  satisfy  min(7ia 

WaO,  ^6  “  Tlbo)  <  0.  • 

It  has  been  shown  in  [9]  that  under  the  sufficient  order 
CEise,  equals  the  set 

:=  {e  \Biq-^-.e)/A{q-^-,e)  =  H{q-^)}. 

(4-4) 

Under  undermodeling  (reduced  order  case),  by  [9], 
is  minimum-phase;  hence  the  fitted  model 
d{q~^)  =  stable.  Moreover, 

under  undermodeUng,  ^  ^  is  unique  (i.e.  is  a  single- 

ton),  and  Jioo(^^^)  >  0- 

Using  the  above  results  from  [9]  and  Theorem  1,  the 
following  result  is  immediate. 

Theorem  2.  Under  (AS1)-(AS4)  and  undermo deling, 
UmT^co^;^ 

where  is  unique  and  is  given  by  (4-2).  Under  (ASl)- 
(AS4)  and  sufficient  order  modeling, 

If  min(na  -  riao.nb  -  nto)  =  0,  then  is  a  singleton. 


5  Conclusions 

A  frequency-domain  solution  to  the  least-squares  equa¬ 
tion  error  system  identification  problem  was  considered  us¬ 
ing  the  power  spectrum  and  the  cross-spectrum  of  the  10 
data  to  estimate  the  10  parametric  transfer  function.  The 
proposed  approach  was  shown  to  yield  a  unimodal  per¬ 
formance  surface,  consistent  identification  in  colored  noise 
and  sufficient-order  case,  and  stable  fitted  models  under 
undermodeling  for  arbitrary  stationary  inputs  so  long  as 
they  are  persistently  exciting  of  sufficiently  high  order. 
This  is  unlike  quite  a  few  existing  approaches,  such  as  the 
prediction  error  method,  the  output  error  method  and  and 
the  instrumental  variable  method. 
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Abstract 

The  rank  selection  problem  of  a  multichannel  data 
covariance  matrix  addressed  by  the  Bayesian  method¬ 
ology,  A  maximum  a  posteriori  solution  is  derived^  and 
a  bootstrap  technique  for  its  implementation  proposed. 
Our  rule  is  tested  on  simulated  sensor  array  data  that 
represent  random  signals  embedded  in  white  Gaussian 
noise.  The  tests  include  comparisons  with  the  pop¬ 
ular  AIC  and  MDL  criteria.  The  results  show  that 
the  Bayesian  rule  outperforms  them^  particularly  for 
low  signal-to-noise  ratios  and  small  direction- of- arrival 
separations. 


!•  Introduction 

In  many  signal  processing  applications,  the  princi¬ 
ple  of  rank  reduction  plays  an  important  role  [4],  In 
sensor  array  processing  it  is  often  applied  to  determine 
the  number  of  signals  that  arrive  at  an  array  using  a 
finite  set  of  observed  data  vectors.  The  rank  reduc¬ 
tion  is  practically  a  model  selection  problem,  and  as 
such,  in  recent  years,  has  been  addressed  by  exploiting 
information  theoretic  criteria  [1],  [5]-[7]. 

In  this  paper,  we  examine  the  same  problem  and 
propose  a  maximum  a  posteriori  (MAP)  solution  that 
is  in  form  similar  to  the  well  known  selection  rules  of 
Akaike  (AIC)  and  Rissanen  (MDL)  [5].  Our  rule  has 
a  different  penalty  for  overparameterization,  and  un¬ 
like  the  AIC  and  MDL,  the  penalty  is  determined  from 
the  observed  data.  It  contains  terms  that  include  co- 
variance  matrices  of  the  estimated  model  parameters. 
To  estimate  these  matrices,  we  apply  a  bootstrap  tech¬ 
nique  as  proposed  in  [3].  Our  rank  selection  procedure 

'•'This  work  was  supported  by  the  National  Science  Founda¬ 
tion  under  Award  No.  MIP-9506743. 


has  been  tested  by  computer  simulations  and  compared 
to  the  AIC  and  MDL. 

2.  Problem  Statement 

We  formulate  the  problem  using  standard  assump¬ 
tions  and  the  notation  from  [5].  Namely,  a  set  of  p  x  1 
complex  data  vectors  x(U,  ^=1,2, ...,  iV,  are  observed, 
where 

x(t)  =  As(U  +  n(U.  (1) 

Here,  A  is  a  p  x  m  (p  >  m)  complex  matrix  of  full 
rank  whose  columns  are  associated  with  different  sig¬ 
nals  and  are  parameterized  by  unknown  signal  param¬ 
eters,  and  s(t)  is  an  m  X  1  random  complex  zero  mean 
vector  whose  elements  are  the  waveforms  of  the  m  sig¬ 
nals.  The  term  n(t)  denotes  a  p  x  1  complex  noise 
vector,  which  is  a  realization  of  a  stationary  and  er- 
godic  Gaussian  process  with  zero  mean  and  covariance 
matrix  £'(n(t)n^(U)  ==  ^^I*  The  noise  and  the  signals 
are  independent. 

The  covariance  matrix  of  the  data  can  be  expressed 
as 

R  =  3^  +  <7^1  (2) 

where  ^  is  the  signal  covariance  matrix  given  by 

^  =  ASA^  (3) 

with  S  being  defined  by  S  =  £'(s(^)s^(U).  Since  A  is  a 
full  column  rank  matrix  and  S  is  by  assumption  nonsin¬ 
gular,  the  rank  of  is  equal  to  m.  On  the  other  hand, 
the  rank  of  R  is  p,  and  its  (p  ~  m)  smallest  eigenvalues 
are  equal  to  Thus,  if  we  knew  R,  by  observing  its 
eigenvalues  we  could  directly  find  the  number  of  signals 
in  the  data.  However,  R  is  almost  never  available  in 
practical  applications,  but  instead,  is  estimated  from 
the  observed  data  vectors.  To  determine  the  number 
of  signals  based  on  the  eigenvalues  of  the  estimated  R, 
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R,  is  not  an  easy  task  because  the  smallest  eigenvalues 
of  R  are  usually  not  easily  distinguished  from  the  re¬ 
maining  eigenvalues.  Our  objective  is  to  examine  this 
problem  and  determine  the  rank  of  from  the  esti¬ 
mated  matrix  R. 

3.  Criterion  for  Rank  Reduction 

As  mentioned  before,  the  rank  reduction  has  been 
addressed  by  several  authors  who  used  the  information 
theoretic  criteria  AIC  and  MDL  [5]-[7].  Here,  we  pro¬ 
pose  a  different  approach,  which  is  based  on  the  MAP 
criterion.  We  assume  that  the  rank  k  can  take  one  of 
the  q  values,  k  =  1,2,  ..,g,  where  ^  <  p.  For  each  k 
we  have  a  parametric  model,  which  is  denoted  by  Af*. 
The  model  A!*  is  described  by  the  k  largest  eigenvalues 
of  R,  A/,  /  =  1, 2, ...,  fc,  their  associated  eigenvectors,  v/, 
and  the  noise  variance 

Since  our  objective  is  to  find  the  rank  that  has  the 
maximum  a  posteriori  probability,  our  criterion  can  be 
expressed  as 

k  —  argmax{p(A<j:lx(l),x(2),  •  •  •,x(iV))}  (4) 

k 

where  p(X*|x(l),x(2),  •  •  •,x(iV))  is  the  a  posteri¬ 
ori  probability  of  the  model  given  the  data  records 
x(l),x(2),  --,x(iV).  If  all  the  rank  hypotheses  are  a 
priori  equally  probable,  the  criterion  (4)  becomes, 


If  Vir,  Vij,  V2r,  V2i,  ' '  ^'ki  denote  the  real  and 
imaginary  components  of  the  eigenvectors,  then  Sj.  = 
[J.  A,  ...  A.  itl  yjf-"  vl  vi;'-' .  ■■vl'-’],  wte,. 
the  vectors  are  of  size  p  —  1  with  elements  iden¬ 

tical  to  the  first  p  —  1  elements  of  v/,- . 

Now,  if  we  apply  the  selection  rule  (7),  we  may 
experience  a  scaling  problem.  Namely,  for  two  sets 
of  data  which  are  only  related  by  a  scaling  constant, 
the  rule  may  yield  two  different  results.  This  is 
unacceptable,  and  therefore  we  modify  (7)  so  that 
the  rule  is  based  on  the  predictive  densities  f(x(L  + 

l),x(L  +  2),..*,x(Ar)lx(l),x(2),*.-,x(L),A<fc)>  ^r 

jfe  =  1, 2,  •  •  • ,  g,  where  the  data  records  x(l),  x(2),  •  •  •, 
x(L)  can  be  considered  as  training  data  records.  With 
the  same  approximations  used  for  obtaining  (7),  we  can 
show  that  the  modified  rule  becomes 


k  =  argmin{— ln/(x(l),x(2),  * 
k 


■  ,x{N)\9k,Mk) 


-fln/(x(l),x(2),---,x(L)|0j:,A4A:)  “ 

where  0k  and  0k  are  the  maximum  likelihood  estimates 
of  the  model  parameters  obtained  from  all  the  data  and 
the  first  L  data  records,  respectively,  and  Ck  and  Ck 
are  the  estimated  covariance  matrices  of  0  k  and  0k, 
respectively. 

Now,  using  the  model  assumptions,  our  rule  simpli¬ 
fies  to 


k  =  arg  max{/(x(l),  x(2),  •  - ,  x(A)lAf  fc)}  (5) 

k 

where  /(x(l),  x(2),  •  •  • ,  x(A'^)|A4*)  is  the  marginal  den¬ 
sity  of  the  data  given  the  model  Mk-  The  marginal 
density  is  obtained  from 

/(x(l),x(2),---,x(Ar)|M,) 

/(x(l),x(2),  •  ■  ■  ,x{N)\6k,Mk)f{9k\Mk)d0k 

(6) 

where  0k  is  the  parameter  space  of  the  fc-th  model, 
and  f{9k\Mk)  is  the  a  priori  density  of  the  model  pa- 
rameters. 

We  can  show  that  the  criterion  (5)  can  be  approxi¬ 
mately  expressed  by  [2] 


k  =  arg  min  ^ 
k 


.InnW 


, ...  nL 


.2(p-k)N 


-bln^ 


Af 


~2(p—k)L 


-iln|C,|-l-iln|C,||  (9) 

where  ^  ELfc+i  A/,  and  crl  =  ^  E^fc+i  A/- 
A  critical  step  of  the  procedure  that  implements  (8) 
is  the  evaluation  of  the  covariance  matrices  and 

Cjb.  These  matrices  can  be  obtained  by  a  bootstrap 
technique,  which  is  described  in  the  next  section  along 
with  some  other  details  of  the  procedure. 


4.  Implementation  by  the  Bootstrap 
Technique 


k  —  argmin{— In  /(x(l),x(2),  •  •  • , x(i\r)l0fc,  Affc) 
k 

-^ln|Cfc|-^ln(2x)}  (7) 

where  0k  is  the  maximum  likelihood  estimate  of  0k , 
Ck  is  the  estimated  covariance  matrix  of  0k,  and 
dk  is  the  dimension  of  the  model’s  parameter  space. 


To  evaluate  (8)  for  every  Ar,  first  we  find  the  corre¬ 
lation  matrix  R  according  to 

N 
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where  X  =  [x(l)x(2)  •  •  •x(i\r)].  Similarly,  we  obtain 
R  from  the  first  L  data  vectors.  Next,  we  determine 
the  eigenvalues  and  eigenvectors  of  R  and  order  them 
such  that  Ai  >  A2  >  •  •  ’  >  Ap.  From  the  p  —  k  smallest 
eigenvalues,  we  obtain  and  from  the  so  estimated 
and  the  eigenvalues,  we  determine  the  first  term  in 
(9).  We  repeat  these  steps  for  the  matrix  R,  and  find 
the  second  term  in  (9). 

To  estimate  Cjt  and  Cjb,  we  use  a  bootstrap  ap¬ 
proach  [3].  First  we  estimate  the  parameters  of  the 
Ar-th  model  for  M  different  sets  of  bootstrap  data  X*, 
/  =  1,2, -“jM,  where  X*  is  a,  p  x  N  matrix  whose 
columns  are  randomly  chosen  from  the  columns  of  the 
actual  data  matrix  X,  i.e., 


bootstrap  data  X"*"  is  v*,  we  rotate  the  vector  v*  by  an 
angle  (p  so  that  we  minimize 

d  =  (v;  — 


The  angle  (p  that  minimizes  d  is 


(p  =  arctan 


Im(vf^vf) 
Re(vf^v*)  ’ 


Finally,  after  the  rotation,  we  have  to  choose  the  sign 
of  so  that  the  rotated  vector  points  in  the  same 

direction  as  v/.  The  same  steps  are  implemented  in 
evaluating  every  eigenvector.  So  is  the  case  in  deter¬ 
mining  the  eigenvectors  of  C*. 


Xt  =  [x^^(l)xr(2)...x?(Ar)] 

=  [x(/i)x(/2)---x(/Ar)].  (11) 

It  should  be  noted  that  some  columns  from  the  original 
matrix  may  appear  more  than  ones  in  X^,  and  some 
not  at  all.  From  each  of  the  M  bootstrap  matrices, 
we  first  estimate  the  model  parameters,  and  then  de¬ 
termine  the  covariance  matrix  of  the  parameters.  The 
same  procedure  is  repeated  for  the  X*  data  records  to 
estimate  C^,  Once  Cjk  and  Cjk  are  found,  we  compute 
the  overall  criterion  of  the  examined  model. 

Recall  that  the  A:— th  model  parameters  are  the 
largest  k  eigenvalues,  the  associated  eigenvectors,  and 
the  noise  variance.  The  eigenvectors  have  to  be  treated 
carefully  for  two  reasons:  the  first  is  that  they  are 
normalized,  and  the  second,  that  they  are  not  unique. 
Since  the  eigenvectors  satisfy 

vf^vi  =  1,  /=l,2‘k  (12) 

not  all  the  elements  of  v/  are  free  parameters.  If  v/  is 
of  length  p,  due  to  (12),  the  number  of  free  parameters 
is  2p—  1.  Therefore,  in  defining  0*,  we  have  to  exclude 
the  non-free  parameters.  In  our  definition  of  we 
exclude  the  last  component  of  the  imaginary  part  of 
each  eigenvector.  Therefore,  the  sizes  of  the  Cjt  and 
Cjb  matrices  are  (2pk  +  1)  x  (2pk  -f  1). 

Note  also  that  if  v/  is  the  eigenvector  corresponding 
to  the  /— th  eigenvalue  of  R,  i.e. 

Rv/  =  A/v/ 

then  any  vector  v/(y?)  =  is  also  a  legitimate  eigen¬ 
vector  of  R.  Since  we  use  the  eigenvectors  to  compute 
the  covariance  matrix  of  the  model  parameters,  this 
ambiguity  is  undesirable.  So,  in  our  implementation 
of  the  bootstrap  algorithm  we  proceed  as  follows.  If 
the  maximum  likelihood  estimate  of  the  /-th  eigenvec¬ 
tor  obtained  form  X  is  r)/,  and  the  estimate  from  the 


5.  Simulation  Results 

We  have  tested  the  MAP  rule  in  three  experiments 
and  compared  it  with  the  AIC  and  MDL  by  using  com¬ 
puter  simulated  data.  The  columns  of  the  matrix  A 
were  given  by 

aj’(<^)  —  [1  g-i27rsin(<^fc)  ^  ^  ^  g~i(p--l)7rsin(0fc)j 

where  k  =  1,  2,  •  •  • ,  m,  and  is  the  direction  of  arrival 
of  the  A:-th  signal. 

In  all  the  experiments,  there  were  two  signals  (m  = 
2)  whose  amplitudes  were  given  by 

s^(^)  =  (I4) 

where  r/i  and  772  are  independent  and  uniformly  dis¬ 
tributed  random  variables  in  the  interval  (0,27r).  The 
number  of  sensors  was  p  =  7,  and  the  maximum  pos¬ 
sible  rank  g  =  4.  The  evaluation  of  the  covariance 
matrices  was  carried  out  by  M  =  300  bootstrap  data 
matrices,  and  the  number  of  training  data  records  was 
£=10.  In  each  experiment  there  were  100  independent 
trials.  The  used  AIC  and  MDL  rules  were 

fcAic  =  arg min  |-2 In  +  2fc(2p  -  fc)| 

(15) 

and 

L  _ :_f  ,_n!’=i+iAr  ,  k(2p-k)lnN] 

kMDL  -  arg  mm  |  -  In  ^ - f  ' 

(16) 

In  the  first  experiment  the  signal-to-noise  ratio 
(SNR),  defined  by  SNR  —  101og(l/o’^),  wets  equal  to 
0  dB,  and  the  directions  of  arrival  were  <^i  —  20®  and 
(/>2  =  28°.  The  number  of  data  records  was  N  —  80. 
The  results  are  shown  in  Table  1.  Each  entry  repre¬ 
sents  the  number  of  times  the  MAP,  AIC,  and  MDL 
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jfe  =  1 

k  =  2 

k  =  Z 

*  =  4 

MAP 

0 

99  1 

0 

AIC  0 

95 

4 

1 

MDL 

O 

o 

o 

0  0 

Table  1.  Performance  of  the  MAP,  AIC, 
and  MDL  rules  in  100  trials  for  SNR=0 
dB,  (^1=20°,  (t>2  =  28®,  and  N  =  80.  The 
correct  model  is  Af2* 


=  i 

k-2 

CO 

11 

ifc  =  4 

MAP 

2 

92  5 

1 

AIC  7 

89 

3 

1 

MDL 

47 

53 

0 

0 

Table  3.  Performance  of  the  MAP,  AIC, 
and  MDL  rules  in  100  trials  for  SNR=0  dB, 
(j>i=  22®,  28®  and  N  =  50.  The  correct 

model  is  Al2' 


k=l 

k=2  k=Z 

k  =  4 

MAP  2  89 

8 

1 

AIC 

13 

74  12  n 

1 

MDL  77  23 

0 

0 

Table  2.  Performance  of  the  MAP,  AIC, 
and  MDL  rules  in  100  trials  for  SNR=-3 
dB,  <f>i=  20®,  <l>2  -  28®,  and  N  =  50.  The 
correct  model  is  At 2* 


rules  chose  the  models  with  ranks  k  =  1,2,3,  and  4, 
respectively  out  of  100  trials.  From  the  results  we  ob¬ 
serve  that  all  the  rules  showed  excellent  performance. 

In  the  second  experiment,  we  decreased  the  SNR  to 
-  3  dB  and  the  number  of  data  records  to  iNT  =  50,  but 
kept  all  the  remaining  parameters  identical  to  those  in 
experiment  1.  The  results  are  shown  in  Table  2.  The 
performance  of  the  MDL  degraded  significantly.  The 
AIC  performed  better,  and  the  MAP  was  the  best. 

Finally,  in  the  third  experiment,  we  decreased  the 
separation  of  directions  of  arrival  by  setting  =  22®, 
and  <j>2  -  28®,  increased  the  SNR  to  0  dB,  and  kept 
the  remaining  parameters  unchanged  as  in  experiment 
2.  The  results  are  shown  in  Table  3.  Again,  the  MDL 
performed  poorly,  and  the  MAP  had  the  best  perfor¬ 
mance. 

6.  Conclusions 

A  new  approach  to  rank  determination  of  covariance 
matrices  has  been  proposed.  It  is  based  on  the  MAP 
criterion  and  implemented  by  the  bootstrap  method. 
The  method  in  this  paper  requires  assumptions  of  a 
specific  structure  of  the  covariance  matrix.  Current 
research  is  focused  on  relaxing  these  assumptions  to 
make  the  current  procedure  applicable  in  a  wider  set 
of  scenarios. 
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Abstract 

Using  the  information  theoretic  criterion  the  authors 
obtained  in  [f]  three  consistent  estimates  of  the  num¬ 
ber  of  signals  for  an  additive  model  with  white  noise.  In 
this  paper  the  rates  of  convergence  for  the  probabilities 
of  wrong  detections  as  a  function  of  the  sample  size  are 
studied.  It  is  proved  that  under  certain  conditions  and 
for  a  fairly  general  class  of  penalty  terms,  the  proba¬ 
bilities  of  wrong  detection  are  exponentially  decreasing. 


1  Introduction 

In  signal  processing,  a  problem  of  great  interest  is 
the  determination  of  the  number  of  signals  transmitted 
in  the  presence  of  noise.  The  received  signal  vector  a5(t) 
is  p  X  1  complex  vector  given  by 

x(t)  =  -f  n{t) 

where  A  is  p  x  q  matrix  A  —  [A(<f)i),  A{(f)2),  • . .  jA{(j)q)] 
and  A{(l)i)  is  a  p  x  1  complex  vector  which  depends  on 
some  unknown  <f)i  associated  with  the  direction  of  ar¬ 
rival  for  the  2th  signal,  s(t)  =  {si{t),S2{t), . . . ,  Sq{t)y , 
Si{t)  is  the  2th  complex  waveform  signal,  and  n{t)  is  a 
p  X  1  complex  vector  associated  with  the  noise.  The  as¬ 
sumptions  made  here  are  (1)  q  <  p]  (2)  s{t)  is  complex 
multivariate  normal  with  mean  vector  0  and  nonsin¬ 
gular  covariance  matrix  (3)  the  noise  vector  n{t)  is 
complex  multivariate  normal  with  mean  vector  0  and 
covariance  matrix  cr^Ip,  where  Ip  is  the  p  x  p  identity 
matrix,  and  n{t)  is  also  independent  of  the  signals.  The 
covariance  matrix  of  the  x{t)  is  given  by 

E  =  A^A*  +  a^Ip 

where  A*  denotes  the  transpose  of  the  complex  con¬ 
jugate  of  A.  The  number  of  signals  transmitted,  q,  is 


equal  to  the  rank  of  Let  Ai  >  A2  >  •  •  •  >  Ap  be 

the  eigenvalues  of  E, 

Ai  >  A2  >  ■  ■  ■  >  A^  >  A^^.1  =  ■  ■  -  =z  Xp  =z 

Let  {x{ti),x{t2)y . . .  ,x{tn)}  be  a  set  of  observations 
and  nS  =  x{ti)x*{ti).  Then  E(S)  =  E,  Suppose 
that  Si  >  62  >  ‘  ">  6p  are  the  eigenvalues  of  S.  Let 
Hjc  denote  the  hypothesis  that 

Hk  :  Ai  >  •  •  >  Afc  >  Ajk+i  =  ■  ■  =  Xp  =  (P . 

Mjfc  is  the  model  that  is  true.  Let 

I{k,Cn)  =  L{k)  -b  i'{k,p)Cn^  (1) 

Here  L{k)  is  a  statistic  which  will  be  specified  later, 
i'{k,p)  denotes  the  number  of  free  parameters  that  has 
to  be  estimated  under  and  Cn  is  some  constant 
chosen  to  depend  on  n.  The  criterion  for  determing  the 
number  of  signals  is  to  estimate  the  number  of  signals 
q  by  dn  which  is  chosen  so  that 

I{qn^Cn)  =  min{/(0,  Cn),  7(1,  Cn),  •  •  •  ,  /(p  -  1,  Cn)}. 

z/(/:,p)Cn  in  this  case  is  called  the  penalty  term.  The 
choice  of  C„  is  crucial  and  its  selection  was  discussed 
in  [5].  In  general  Cn  is  chosen  to  satisfy  the  following 
conditions: 


lim  —  =  0; 

n— *>00  n 


Cn 


n^oo  log  log  n 


OO, 


(2) 


and  u{k,p)  is  either  |  or  |(2p  -  -h  1).  Let 


Li{k)  =  -n{  ^  log,?,- -  (p- ^  N)}- 

*=fc4-l 


t=ib+l 


Ak  =  -{J2  logXi  -  {p-  k)log{—f-  A)}- 
i=k  +  l  ^  «=fc4-l 

It  can  be  shown  that  ~Li{k)  is  the  likelihood  ratio 
test  statistic  for  testing  under  the  assumptions  of 
normality  and  independence  of  observations. 
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With  L{k)  =  Li{k)  in  (1)  it  was  proved  in  [5]  that 
the  estimate  of  5  by  g  is  consistent.  Its  rate  of  conver¬ 
gence  Wcis  shown  to  be  exponential  in  [1].  The  same 
result  was  obtained  in  [2]  where  white  noise  is  not  as¬ 
sumed.  Recently  a  different  formulation  of  Li{k)  was 
provided  in  [3].  However  it  is  not  known  if  it  will  give 
a  consistent  estimate  of  q.  Bcised  on  the  result  of  [3], 
the  authors  proposed  several  consistent  estimates  in 
[4].  The  main  purpose  of  this  paper  is  to  investigate 
the  rate  of  convergence  for  three  of  these  consistent  es¬ 
timates.  It  will  be  proved  that  for  a  fairly  general  class 
of  C„  functions,  the  rate  is  again  exponential. 


2  Preliminary  Lemmas 


Lemma  2.1:  Suppose  that  0  <  cc  <  and 

max.j  \(Tij  -  s.jl  <  a.  The  followings  are  true: 

1.  0  <  Li{k)  <  k  >  q\ 

2.  Li{k)  +  v{k,p)Cn- Li{q)-v{q,p)Cn  >n(At/2- 
Ap^a/cr'^  -  p^a^/cr'^)  -  v{p  -  l,p)C„,  ilk  <q 

Proof:  For  the  proof,  see  Theorem  3.1  of  [1]. 

Let  S  =  and  S  =  The  following 

result  can  be  found  in  [1] : 

niax  IcTiy  —  Sij  I  <  a  =>  1-^t  —  |  ^  *  =  1, 2, . . . ,  p 

(3) 

For  ^  >  0,  define 

Aac(0  =  «(Z)r+  YL  + 

irrl  t=jfc+l  i=l 

k 

+  (p  -  k){n  —  k)  log  Afc+i  +  ^2  ^ j  +  0 

t.  i  =  1 

•  <  i 

k 

-  fc)log(Ai  -  A)fe+i  -f-0 

i=l 

XI  log(<5i  - 

i,i  =  fc  +  1 
»  <  i 


where  A,-  and  Afc4.i  are  solutions  of 

A  -  /!  _  X  V 

n  2^  A,  -A.+^ 

»  =  1 


Ajb-i-1  =  d* 


* 

p-k  AjA^ -  j  =  l,2,...,k 

n  A,•-A,^.l-^^’ 

2  .  ^  AjAfe^i 


(4) 


n  ^  A{  —  Ajb+i  +  C 

|At~Aj|  <  p^,  |A,1  <  ns  logn,  i,  j  =  1, 2, . . . ,  -h  1 


where  0  <  p  <  1  is  a  constant,  Si=jb+i  di¬ 

lemma  2.2:  There  is  a  iV  >  0  such  that  for  all 
n  >  the  system  of  equations  (4)  has  at  most  one 
solution. 

Proof:  Define 


i  =  1 
i  ^  j 


P  k  XjX/..^i 

n  Xj-Xk+i+^’ 


forj  =  1,2,...,!; 


1  *  ^ 

9k+l{xi,^2,-  ■  ■  ,  ®*:+l)  =  ~  X! 


XiXk+1 


n  ^ — '  Xi 
i=l  ' 


Xk  +  1 


and  G{x)  =  (fifi,  ff2,  •  •  • ,  ffifc+i)(*)-  Then  G{^  =  y  if  and 
only  if  y  is  a  solution  of  (4).  Let  Dn  =  {x  :  \xi  —  Xj\< 
pi,  \xj\<'n?l^\ogn,i,j  =  \,2,...,k-\-l}.  Then  D„ 
is  convex.  It  is  easy  to  show  that  on  Dn, 


M  =  '^\\V9ji^\\ 

where  |{.{|  denotes  the  Euclidean  norm.  Suppose  that 
there  exist  two  distinct  fixed  points  x  and  p  in  £>„.  By 
the  mean  value  theorem  there  is  a  zeDn  on  the  line 
segment  joining  x  and  y  such  that 

i?-  G{x)  -  G{y)  =  ^  iO 

Let  N  be  so  that  for  all  n  >  iV,  M  <  1.  Then 
11^'"^ I  ^  I  <  ll^~yll-  This  is  a  contradiction 

and  the  uniqueness  of  solution  is  proved. 

Lemma  2.3:  Suppose  that  p^  >  Ai  —  and 
maxijlaij  -  5ij|  <  ao  <  rmn{a^ / {2p) ,  {p^  -  Ai  + 
cr^)/(4p)}.  Then  there  is  a  TV  >  0  such  that  for  n  >  N 
the  system  (4)  has  a  unique  solution  and 

\6i-k\  <  i  =  l,2,...,k 

n 

Id^-At+il  <  -  (5) 

n 

where  7  =  p(Ai  4-  2pao)^/[(l  -  p)i]- 

Proof:  Consider  the  subset  of 
Ek  =  {ki  -  <5*1  <  ao,  i  =  1,2,...,!;,  |xj,+i  -  d-^|  < 
ao  and  \xi  -  Xj\  <  p^}.  Let  N  be  large  enough  such 
that  (p  -  l)[Ai  +  (p  +  l)ao]V(iV(l  -  P)0  <  “0-  If 
n>  N,  then  it  can  showed  that 

l^j  -  i  =  1,2, ...  ,1: 

-  Pib+i(s)|  <  ao, 

ki (®)  -  Sf;  (®)  I  <  l'5i  -  1  +  ao  <  Ai  -  (T^  -4-  4pao  <  pi 

i,j=l,2,...,k+l. 
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Therefore  G{x)  G  E^.  Clearly  G{x)  is  continous.  By 
the  feed  point  theorem  the  system  (4)  has  a  solution 
(^ij  •  •  •  5  in  Ek  and  it  follows  that 


\6i-K\  < 

|a-^  -  Ajfc+i(x)|  < 


1  p(Ai  +  2pao)^ 
n  (1  -  p)^ 

1  p(Ai  +  2pao)^ 
n  (1  -  p)^ 


i  =  . . .  ,k 


By  Lemma  2.2,  the  solution  is  unique. 

Lemma  2.4:  Suppose  that  >  Xi  —  and 
maxtj  \aij  —  Sjj|  <  a  <  ao  <  min{cr^/(2p),  {p^  -  Ai  + 
cr^)/(4p)}.  There  is  a  >  0  such  that  for  n  >  A^,  we 
have 


1  4-  Ri{i,  k),  z  =  1, 2, . . . 


where  =  |i(2p  -  i  -  1)  log  n  -  ELp-fc+i  log  T{t). 

With  N  given  in  Lemma  2.3  and  n  >  AT,  we  have 
the  following  theorem. 

Theorem  3.1:  Assume  that  >  Ai  —  and  as¬ 
sume  also  that  the  following  conditions  are  satisfied: 

(a)  0  <  a  <  ao  <  min{crV(2p),  {p^  -  Aj  +  cr2)/(4p)}; 

(b)  mm{z/(^  +  l,p)  -  v{k,p))Cn}  >  + 

V 

2inax{Ajfc}  +  El°gr(i); 

i-\ 

(c)  min{Afc}  >  Va/cr^+pV/or^+i/(p-l,p)C'„/n+ 

logn/n  +  2max{Ait}/n. 
k 

Then 


(b)  ^  =  {p-k)[l  +  R2ik)] 

(c)  log(A,)  =  log  Si  +  Rz{i,  k),  i=l,2,...,k 

(d)  log(A,+  i)  =  log(a2)  +  i?4(^) 

(e)  log(Ai  -  Aj  +  0  =  logC-^i  -  Sj  +  0  +  R^{i,  j,  k) 
i,j  =  1,2, . . .  ,k,  i  <  j 

(f)  log(Ai  -  Ajfc+i  +  ^)  =  log(5i  -  6-^  +  ^)  +  Re{i,  k) 
i,j  =  1,2,... ,k 

where  |i2,  |  <  p/n  (1  <  *  <  6),  7  aa  in  Lemma  2.3  and 
_  f  (Ai+pao)7  7  2j 

p  -  maxi  (^2  _  (p  V2  -  (p  +  l)ao  ’  (1  -  p)^  ^ 

Proof:  The  proof  requires  simple  calculus. 

3  Rates  of  Convergences 

Suppose  that  the  assumptions  made  in  Lemma  2.3 
are  satisfied.  Using  the  expansion  of  Lemma  2.4,  we 
may  rewrite 

p 

A.,(e)  =  iiW  +  »i:  log(5i  +  np-f  /?(n,  k) 

i=l 

The  following  simple  bound  for  /?(n,  k)  can  be  obtained 
by  the  estimates  of  Lemma  2.4  and  (3). 

\l3(n,k)\  <  2p^[p+  |log(Ai  +pao)|  V  |log(cr^  -pao)|+ 
I  log(Ai  -  Ap  +  2pao  +  01  V  i  log((l  -  p)^  -  27/7V)|] 

def  r 
=  At. 

For  the  first  type  of  estimates  define  L{k)  in  (1)  by 
m  =  Aac{0+Pac  (6) 


EiQn  ^  Q\Rq)  S  ^  ^ij  \  ^  <^)- 

*  j 

Proof:  Suppose  that  maxij  \<rij  —  Sij  \  <  a.  By  the 
assumptions,  the  results  of  Lemma  2.3  and  2.4  hold. 
By  the  definition  of  I{kjCn),  we  have 

I{k,Cn)  -  liq,  Cn)  =  ^{k  -  g)[2p  -  {q k)  -  1]  logn 
[Li(A;)  -I-  u{k,p)Cn  -  Li{q)  -  u{q,p)Cn]  +  f3{n,k) 

-l3{n,q)-\-  logr(2)-  \ogT{i). 

i-p-q+l  i=p-k+l 

Hence,  for  ^  >  5  it  follows  from  lemma  2.1  and  the 
assumption  (b)  that 

I{k,Cn)  -  I{q,Cn)  >  {i'iq+  l,p)  -  i'iq,p))C„  (7) 

P 

-  2np^a^  —  2max{Ajk}  -  X]logr(z)  >  0. 

i=l 

For  k  <  q/\i  follows  from  lemma  2.1  and  assumption 
(c)  that 

l^{k,  Cn)  -  I{q,  Cn)  >  n(Afc  -  Ap'^afa^  —  p^a^fcr^) 

-  v{p-  l,p))Cn  -  2max{Ajb}  -p^logn  >  0.  (8) 

In  view  of  (7)  and  (8),  we  have  q  —  q.  Therefore, 

P{qn  ^  q\H,)  <  E  E  l  >  «)• 

*  j 

If  i^ikjp)  is  a  strictly  increasing  function  of  k  and 
a  =  a{n)  |  0,  —  0,  oo,  (9) 

then  for  p^  >  Ai  —  cr^  we  have  the  following  theorem. 
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Theorem  3,2:  If  Q'(n)  and  Cn  satisfy  (9), 

then  the  probability  of  wrong  detection  using  satis¬ 
fies  the  following  inequality: 

P{q„  ^  q\Hg)  I  - 

*  j 

Proof:  It  is  obvious  that  the  assumptions  made  in 
Theorem  3.1  hold.  Then  the  result  follows  from  Theo¬ 
rem  3.1. 

For  ^  >  0  the  second  type  of  estimates  is  defined  by 
letting  L{k)  in  (1)  to  be 

k 

L2{k)  =  (n  -  p  +  1)  ^  log  6i  +  (p  -  k){n  -  k)  log(a-^) 
1=1 

k  k 

+  ^  log((5i-<5j +0  +  ^(p-k)log(<5i-^^+0 

i,  J  =  1  i=l 

»  <  3 

P  P 

^  log((5i-(5j+0  =  Li(k)  +  n^log(5i 

i,  j  =  fc  +  1  ^ 

i  <  3 

+  I3{n,k)-  ^  log((5i  -  (5j +^)  (10) 

*.  i  =  fc  +  1 

*  <  3 

k  k 

0{n,k)  =  ^(p  -  k)  log((5,-  -  6-^  +  ^)  +  (1  -  p)  ^  log  (5,- 
«=i  «=i 

+  2  ^  log((5i  -  <5j  +  0  -  k(p  -  k)  log(o-2) 

*.  i  =  fc  +  1 

i  <  3 

For  the  third  type  of  estimates,  use 

I{k,Cn)  =  L2{k)  +  Pac  +  t'ik,p)C„  (11) 

with  L2{k)  as  in  (10).  Let  qn  be  the  estimate  obtained 
by  either  the  second  and  third  type,  then  the  following 
theorem  with  proof  similar  to  that  of  Theorem  3.2  is 
true. 

Theorem  3,3:  If  a{n)  and  C„  satisfy  (9), 

then  the  probability  of  wrong  detection  using  qn  satis¬ 
fies  the  following  inequality: 

P{qn  #  q\Hg)  <  E  E  I  ^  “) 

i  i 

In  the  following,  qn  will  denote  an  estimate  given  by 
any  of  the  three  types  mentioned. 

Theorem  3.4:  Suppose  that  xi,X2,'  *  ^tre  i.i.d.  vec¬ 
tors  of  order  p  x  1  such  that  =  0,  J5J(xiXi)  =  E 

and  <  oo  for  some  k  >  1.  Also  let  Cn  in  (1)  be 


chosen  so  that  Cn,  a{n)  and  i^{k^p)  satisfy  (9).  Then 
for  any  s  >  /c,  we  have 

P(g„  #  q\Hg)  =  0{n/inar)  +  0((na2)-.) 

as  n  — >  OO. 

Proof:  Indentical  to  that  of  Theorem  3.2  [1]. 
Similarly  the  following  results  as  in  [1]  are  true: 

Corolary  3.1:  In  Theorem  3.4,  if  we  take  a  = 
a(n)  I  0  as  a  slowly  varying  function  and  Cn  =  na, 
then 

P(g„7^g|P,)  =  0(ni-'=(a)-'') 

as  71  ^  oo. 

Theorem  3.5:  Suppose  that  Xi,X2,''*  are  i.i.d. 
with  E{xi)  =  0,  £'(xix*)  =  E  and  E'{exp(/c|xi  p}  <  oo 
for  some  /c  >  0.  Then 

P{qn  /  q\Hg)  <  cexp{-bna^) 

as  n  — >  oo  for  some  constant  6  >  0  and  c  >  0. 

Corollary  3.2:  If  a(n)  j  0  is  a  slowly  varying  func¬ 
tion,  C„  =  a(n)n  and  the  conditions  of  Theorem  3.5 
are  satisfied,  for  any  e  >  0 

P{qn  #  q\Hq)  <  cexp(-6n^"'). 


References 

[1]  Z.  D.  Bai,  P.  R.  Krishnaiah,  P.  R.  and  L.  C.  Zhao, 
”  On  rates  of  convergence  of  efficient  detection  criteria  in 
signal  processing  with  white  noise” ,  IEEE  Trans.  Inform. 
Theory,  vol.  35,  pp. 3 8 0-38 8,  1989 

[2]  Tam,  K.  W.  and  Wu,  Yuehua,  ”On  the  convergence  rate 
of  general  information  theorectic  criteria  in  signal  pro¬ 
cessing  when  the  covariance  matrix  is  arbitrary”,  IEEE 
Trans.  Inform.  Theory,  vol  37,  pp.1667-1671,  1991 

[3]  K.  M.  Wong,  Q.  T.  Zhang,  J.  P.  Reily  and  P.  C.  Yip, 
”  On  Information  Theorectic  Criteria  for  Determining  the 
Number  of  Signals  in  High  Resolution  Array  Process¬ 
ing”,  IEEE  Trans  Acoust.,  Speech  and  Signal  Processing, 
V38”,  pp.1959-1970,  1990 

[4]  Y.  Wu  and  K.  W.  Tam,  ”On  Estimation  of  Number  of 
Signals”,  Submitted 

[5]  L.  C.  Zhao,  P.  R.  Krishnaiah  and  Z.  D.  Bai,  ”On  De¬ 
termination  of  Number  of  Signals  in  Presence  of  White 
Noise”,  J.  Multivariate  Anal.  V20,  pp.1-25,  1986 


47 


Asymptotic  Statitics  of  AR  Spectral  Estimators  for 
Processes  Containing  Mixed  Spectrum 

Peter  Sherman  Lang  White  Joanna  Spanjaard  Robert  Bitmead 

Iowa  State  University  DSTO  CRASys  Austr.  Natl.  University 

Ames.  Iowa  U.S.A  Australia  Austria  Australia 


Abstract 

We  address  the  influence  of  point  spectrum  on  the 
large  sample  statistics  of  the  AR(n)  spectral  estimator 
for  fixed  n  as  well  as  for  the  case  where  n  approaches 
infinity.  For  fixed  n  we  obtain  the  distribution  of  this 
estimator.  We  also  obtain  approximate  expressions  for 
its  mean  and  variance.  These  expressions  involve  the 
nth  order  Capon  spectrum.  Using  recently  discovered 
convergence  properties  of  this  spectrum  as  n 
approaches  infinity,  we  show  that  these  expressions 
depend  on  the  ratio  of  the  AR(n)  to  the  nth  order 
Capon  spectrum.  This  ratio  gives  insight  into  the 
statistical  influence  of  point  spectrum  on  the  AR(n) 
spectral  estimator,  based  on  the  well  known  difference 
in  the  resolving  properties  of  these  two  spectra. 
Simulations  are  included  to  support  the  theoretical 
results.  Finally,  it  is  hoped  that  our  attempt  to  bring 
to  bear  a  number  of  recently  published  results  in  this 
area  will  also  contribute  to  a  better  understanding  of 
it,  and  possibly  stimulate  further  investigations. 

Introduction 

This  work  addresses  the  statistics  of  spectral 
estimators  associated  with  a  zero  mean  wide  sense 
stationary  (wss)  random  process  having  mixed 
spectrum.  From  the  Wold  decomposition,  any  wss 
random  process,  Y ^  ,  has  a  decomposition  of  the  form 

+  Ut  (la) 

where  has  an  absolutely  continuous  spectral 

density,  and  where  is  is  independent  of  and  is 
perfectly  predictable  given  {  Ug  ;  s  <  f  }.  In  this 
work  we  restrict  {7^  to  be  a  harmonic  process;  that  is, 

Ut=Yj  + 

k 

where  the  {^l}  are  independent,  and  identically 
distributed  (iid)  uniformly  over  the  interval  [  — t,x], 
and  where  {Aj^,  are  unknown  parameters. 
Consequently,  Y^  has  an  autocorrelation  of  the  form 


aI 

=  '•xC’-)  +  S  X  (2) 

k 

where  {r2.(r)}^QQ  is  absolutely  summable.  The 
statistics  of  spectral  estimators  related  to  the  regular 
process,  have  been  studied  extensively  over  the 
years.  Only  relatively  recently,  however,  have  those  of 
Y^  received  much  attention.  There  are  a  number  of 
reasons  for  this  recent  interest.  One  is  no  doubt  due  to 
the  increasing  importance  of  such  processes  in  the 
engineering  and  physical  sciences.  An  example  is  the 
spectral  analysis  of  signals  associated  with  periodic 
systems  such  as  rotating  machinery  in  order  to 
identify  cyclostationary  behavior  [1].  Complications 
introduced  by  the  presence  of  a  harmonic  process  in 
these  areas  are  illustrated  in  [1].  In  fact,  given  the 
adverse  influence  of  such  a  process  on  practically  all 
methodologies  related  to  not  only  spectral  estimation, 
but  also  to  system  identification  and  feedback  control, 
one  might  wonder  why  processes  such  as  (1)  have 
been  of  such  limited  interest.  At  least  a  partial  answer 
to  this  question  relates  to  the  mathematical 
difficulties  imposed  by  (lb).  Afterall,  a  key 
assumption  found  in  all  these  areas  is  that  the 
autocorrelation  function  decay  sufficiently  fast; 
whereas  (2)  does  not  decay  at  all. 

The  goal  of  this  paper  is  to  characterize  the  large 
sample  statistical  properties  of  a  particular  class  of 
spectral  density  estimators,  namely  autoregressive 
(AR)  estimators.  We  consider  both  a  fixed  order,  n, 
model,  as  well  as  when  n  approaches  infinity.  Much  of 
this  characterization  will  be  obtained  by  piecing 
together  recent  results  of  other  researchers,  and  in 
particular,  those  published  in  statistical  journals.  A 
significant  portion  will  also  follow  directly  from  the 
convergence  results  of  [2]  related  to  the  family  of 
Capon  spectral  estimators  (see  [1]  for  more  recent 
related  references).  Consequently,  while  we  believe 
that  this  work  contains  valuable  original 
contributions,  it  is  also  our  intent  to  contribute  by 
combining  a  collection  of  recent  developments  along 
these  lines  into  a  self-contained  work.  To  begin,  we 
define  the  spectral  density  and  power  spectrum  for  the 
process  (1): 
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=  5j.(w)  + 

^y(")-  fe)  (2n  +  l)“^^ry(r)e 

=  ^lr(“±"l)  •  <^‘’> 

We  remark  in  the  mixed  spectrum  setting  it  follows 
from  (3)  that  the  spectral  density  is  not  well-defined, 
in  the  sense  that  it  becomes  unbounded  as  the  number 
of  available  autocorrelation  lags  approaches  infinity. 
Moreover,  the  power  spectrum  contains  no 
information  about  the  spectral  density.  Hence,  it  is 
natural  to  expect  that  the  AR(n)  spectral  estimator 
will  be  poorly  behaved  near  the  point  spectrum 
frequencies,  and  furthermore,  that  in  the  limit  (as 
n-K»)  it  will  not  exist  at  these  frequencies.  To  arrive 
rigorously  at  these  conjectures,  consider  the  AR(n) 
prediction  model 

yt=  -  H  ^kVt-k  =  ’'r® 

Jb  =  1 

where  [ai,...,aJ*’’,  and  define  the  prediction 

error  =  Eiyt-ytf.  For  clarity  of  understanding 
and  notation,  we  present  the  minimum  variance  and 
least  squares  equations. 

Minifniim  Variance  Approximation  of  fl  and 

=  -  iZ  "  =  ry(0)  +  a*r  (5) 

where  R  —  {ry(i  -  j)  =  r-  _  p”  j  =  1 

and  r^[rj,(l) . ry(n)]^’'.  In  the  statistical  sense 

used  throughout  this  paper  is  not  an  estimator, 
since  it  is  not  random.  For  this  reason,  ^ifnv  termed 
the  minimum  variance  approximant  of  a. 

Least  Squares  Estimation  of  a  and 


-R-^r  ;  a^  =  ?yiO)  +  ^f 

(6) 

N-j 

r,-=(i/^)E  ytyt+j  • 

(7) 

THEOREM  1.  (Li  et  al  [3])  .  Let  r  =  [fQ,...,r„]^’’ 
whose  elements  are  given  by  (7).  Then 

(f  — r)  ^  Jf(0,E)  . 

N-*oo 

The  elements  of  E  are  complex  expressions,  and  so  are 
omitted  here  for  brevity.  This  recent  (1994)  result  is 
an  extension  of  the  1990  result  of  [4]  for  white  to 
the  colored  case.  It  leads  immediately  to  our  first 
result. 

THEOREM  2.  If  for  any  sufficiently  large  N  we  have 
R>  c>0  almost  surely,  then 

<T^  ~  )  ^mv(^o  ~ J  * 

The  proof  of  this  theorem  is  similar  to  that  in  [5] 
(p.352)  for  the  case  where  U^is  absent.  Our  proof  uses 
similar  types  of  convergence  results  for  random 
variables  as  those  used  in  [5],  but  combined  with  a 
continuity  property  of  means  and  variances,  along 
with  Theorem  1  to  obtain  normality.  In  [5]  normality 
was  obtained  using  a  martingale  difference  argument 
which  does  not  apply  here.  This  theorem  also  extends 
the  result  of  [6],  which  is  essentially  the  same  as  that 
of  [5],  to  the  mixed  spectrum  setting. 

To  continue  on  to  the  statistics  of  the  AR(n)  spectral 

,  _  /  \  r  “  XTILJ  1 

estimator,  define  z  =  i  >  •  •  •  i  J  > 

^(e’")  =  Z*[l  ,0*''  ]  ^'’and  I*’’- 

Here,  *  denotes  the  conjugate  transpose.  Then 

2 

^  1  c*  /  \  ^mv 

We  will  also  require  the  Capon  nth  order  spectrum 

Scapi^)  =  n/z*R-h.  (8) 

Our  second  result  is  then  the  following  theorem. 
THEOREM  3.  For  large  N  we  have 

p(c“^)  ~  iV(  .  5cap(w)^ 

The  correlation  coefficient,  7,  between  ^(c’")  and  0^ 
is  given  by 


Main  Results 

We  now  summarize  the  key  results  in  this  work. 
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It  follows  from  theorems  2  and  3  that  for  large  the 
AR(n)  spectral  estimator  has  a  distribution  which  is 
the  ratio  of  the  normal  random  variable  and  a  non¬ 
central  chi-squared  random  variable  with  two  degrees 
of  freedom.  Moreover,  for  large  N  these  random 
variables  are  approximately  independent.  To  gain 
further  insight  into  this  estimator  we  use  a  first  order 
Taylor  expansion  to  obtain  approximate  expressions 
for  its  mean  and  variance.  This  results  in 

yjl  - - - 

N^S,ap{») 


Far[5( 


frequencies.  In  fact,  it  required  two  theorems  in  [4]  to 
rigorously  prove  that  the  AR(n)  spectral  approximant 
becomes  unbounded  at  these  frequencies  as  n-Kx>. 
This  behavior,  however,  is  a  trivial  consequence  of  the 
convergence  results  in  [2], 


Example.  For  the  process  (la)  let  be  AR(2)  with 
a  =[1.13  ,  -0.64]’,  and  let  be  a  single  sinusoid 
with  A  =  1  and  cj  =  7r/4.  It  follows  that  the  AR(2) 
mv  approximant  values  are  o^^=[-.94  ,  ,57]’  and 
<t^^=1.61.  For  iV=100  samples  per  record,  theorem  2 
yields  the  predicted  approximate  distributions 

a  ^  >r([  Jj]  ,  L]o04  !o07  if (1.61,. 034). 


Using  1000  simulations  of  (la)  we  obtained  estimates 


It  must  be  emphasized  that  all  of  the  above  results  a  ir([  _Jy]  ,  LjoOS  *!007  3)  »  ^  ^  ir(1.64,.050). 

are  for  a  specified  model  order,  n.  Explicit  dependence 

on  n  of  any  quantity  has  been  omitted  for  notational  The  approximate  normality  of  and  a  are  reflected 

convenience  only.  Expressions  (9)  offer  some  in  Figure  1  and  Figure  2,  respectively, 

interesting  insight.  In  particular,  the  nth  order  Capon 
spectrum  plays  a  role  in  both  expressions.  It  is  well 
known  that  this  spectrum  can  also  be  expressed  as 

(i») 

ib  =  0 

where  5^^(a;)  is  the  ibth  order  minimum  variance 
spectral  approximant.  Since  (10)  involves  a  sort  of 
averaging  of  higher  resolution  AR  spectra,  it  is  also 
well  known  that  its  resolution  is  notably  less  than  the 
AR(n)  spectral  approximant  [7].  Hence,  the  mean  (9a) 
is  liable  to  experience  significant  bias  in  the  region  of 
strong  narrowband  spectral  components,  and  in 
particular,  near  point  spectrum  frequencies.  The 
variance  (9b)  will  also  be  influenced  in  these  regions, 
possibly  in  an  oscillatory  manner,  due  to  the  spectral 
oscillations  in  the  AR(n)  spectrum  induced  by  the 
point  spectrum  [7]. 

''l  1.5  2 

Next,  we  consider  the  properties  of  as 

n-K».  From  the  fact  that  the  AR(n)  spectral  Figure  1.  Histogram  for 
approximant  converges  to  Sx{<^)  all  point  of 

continuity,  it  follows  that  the  nth  order  Capon  The  accuracy  of  the  mean  approximation  (9a)  is 

spectral  approximant  also  converges  to  the  same.  shown  against  the  sample  mean  in  Figure  3.  A 
Thus,  at  frequencies  sufficiently  removed  from  the  comparison  of  (9b)  and  the  sample  variance,  however, 
point  spectrum  the  above  theorem,  along  with  (9)  and  revealed  major  difference  at  all  but  very  high 

(10),  give  the  large  n  statistical  description  of  the  frequencies  far  removed  from  the  sinusoid.  The  sample 

AR(n)  spectral  estimator.  They  also  show  that  the  variance  was  two  orders  of  magnitude  higher  than 
condition  n/iV-^O  is  sufficient  for  reasonable  behavior  (9b)  in  the  region  of  the  tone.  At  this  stage  it  is  not 
for  large  order  and  data  lengths.  The  difficulty  in  known  whether  this  difference  is  due  to  the  sample 
identifying  this  large  n  behavior  in  the  mixed  size,  Ny  to  the  Taylor  series  approximation,  or  to 

spectrum  setting  is  greatest  near  the  point  spectrum  some  combination  of  the  two. 
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Figure  2.  Histograms  for  ^^(top)  and  a2(bottom). 


Conclusions 

The  above  results  provide  a  partial  description  of  the 
statistical  behavior  of  AR  spectral  estimators  for 
random  processes  having  mixed  spectrum .  The 
example  illustrated  the  claims  of  mv  parameter 
estimator  normality  for  record  sizes  as  sm^l  as 
iNr=  1 00  samples.  Furthermore,  the  approximate 
expression  for  the  mean  of  the  AR  spectrum  compared 
well  against  the  sample  mean.  While  not  shown  here, 
investigation  of  the  normal  distribution  claim  in 
Theorem  3  also  proved  reasonable  for  the  above 
example.  But  The  variance  expressions  in  both  (9b) 
and  in  Theorem  3  were  nowhere  near  to  the  sample 
variances  at  any  but  the  very  highest  frequencies.  A 


further  investigation  for  this  and  other  ARMA  types 
of  noise  processes  supported  the  rate  dependence  of 
the  variance  on  N,  but  actual  variances  were 
dramatically  different  from  predicted  ones,  except  in 
the  most  simple  casees  where  the  noise  was  essentially 


white. 


Frequency  (radians) 

Figure  3.  Comparison  of  (9a)  and  simulation  estimate. 
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Abstract 

This  paper  represents  a  new  spectral  estimation  method 
for  the  time  series  with  missed  observations.  An  Auto- 
Regressive  modeling  approach  is  adopted.  The  AR  param¬ 
eters  are  estimated  by  optimizing  a  weighted  mean-square 
error  criterion.  The  method  can  be  used  in  real-time,  adap¬ 
tive  contexts  where  the  AR  parameters  are  time  varying.  In 
general,  both  regularly  and  randomly  missed  observations 
can  be  handled  by  this  method.  The  spectral  estimates  are 
compared  to  those  obtained  by  well  known  AR  parameter  es¬ 
timators  used  in  the  cases  where  none  of  the  signal  samples 
is  missed.  The  performance  of  the  method  is  illustrated  by 
some  numerical  examples. 


1.  Introduction 

In  many  practical  situations,  periodically  sampled  sig¬ 
nals  with  missed  observations  may  be  encountered.  This 
is  caused  by  a  variety  of  reasons  such  as  accidentally  loss 
of  some  portions  of  data,  failure  of  the  measurement  equip¬ 
ment,  etc.  In  some  applications  where  data  compression  is 
needed,  one  may  wish  to  reduce  the  whole  number  of  data 
samples.  This  may  result  in  a  periodically  sampled  signal 
with  “missed”  observations. 

Some  important  recent  works  in  this  field  are  [1][2][3]. 
Jones  [1]  has  developed  a  maximum  likelihood  algorithm 
for  ARMA  time  series  with  missed  observations.  He  uses 
state-space  representation  and  Kalman  filtering  to  compute 
the  likelihood  function  of  the  ARMA  parameters  and  this 
function  is  then  maximized  using  some  non  linear  optimiza¬ 
tion  procedure.  Rozen  and  Porat  [2]  have  developed  an 
algorithm  for  the  problem  of  spectral  estimation  through 
the  ARMA  modeling  of  stationary  processes  with  missing 
observations.  This  algorithm  is  asymptotically  optimal  in 
the  sense  of  achieving  the  smallest  error- variance  when  the 
number  of  data  approaches  infinity.  All  of  the  mentioned 
methods  handle  only  stationary  time  series  and  cannot  be 


used  in  an  adaptive  context  where  the  AR  parameters  are 
time  varying. 

In  this  paper,  we  present  a  new  method  of  AR  spectral  esti¬ 
mation  when  the  data  are  not  consecutive,  but  some  of  the 
observations  are  missed.  In  general,  both  regularly  and  ran¬ 
domly  missed  observations  can  be  handled  by  this  method 
[4].  The  method  is  based  on  non-linear  optimization  of  a 
weighted  squared  error  criterion.  All  the  formulae  obtained 
are  recursive,  and  real-time  spectral  estimation  of  non  sta¬ 
tionary  signals  can  also  be  handled  [5]. 

2.  Description  of  the  method 

The  basic  idea  of  this  method  is  very  similar  to  that  of 
the  methods  used  in  adaptive  identification  contexts  (RLS, 
LMS, ...).  For  the  convenience  and  without  loss  of  general¬ 
ity,  in  what  follows,  we  suppose  that  the  period  of  sampling 
is  equal  to  1 . 

We  suppose  that  {t/n}  is  a  discrete  time  zero-mean  AR  pro¬ 
cess  defined  as  follows : 

Vn  =  0^  Vn  +  Vn  (1) 

where  Vn  is  a  zero-mean  white  process  with  variance  cr^, 
6  =  [^1 , . . . ,  is  the  vector  of  the  AR  parameters  and 
1/n  =  [Vn-I, . . . ,  Vu-m]  is  the  vector  of  the  last  M  signal 
samples,  M  being  the  order  of  the  AR  model.  We  suppose 
that  the  signal  {^n  }  is  subjected  to  random  skipping  or  dele¬ 
tion  of  some  samples.  Let  {^i , . . . ,  }  be  the  set  of  instants 

where  the  signal  samples  are  not  missed.  Our  aim  is  to  com¬ 
pute  the  vector  8  that  minimizes  the  following  cost  function : 

=  -  ~  yt,?  (2) 

where  .  is  the  prediction  error  at  instant  U  and  yt^  is  the 
estimate  of  i/t . .  In  order  to  compute  the  value  of  yt-,  we  use 
a  well  known  result  of  the  prediction  theory  that  is  recalled 
below. 
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The  optimal  it-step-ahead  linear  predictor  let  { j/„  }  be 

defined  as  in  (1)  and  y„, . . . ,  y„-M  be  known.  The  best  lin¬ 
ear  mean  square  estimation  of  j/n+fc  is  obtained  by  the  fol- 
lowing  recursion : 

M 

Vn+k  =  ^  OiVn+k-i  (3) 

»  =  1 

This  means  that  at  each  instant  ,  in  order  to  obtain  the  value 
of  yt,,  one  has  to  use  the  recursion  (3)  for  n  =  ti-i  and  for 

addition,  each  missed  sample  yj 
where  j  <  ti-i,  has  to  be  replaced  by  its  estimated  value. 
The  algorithm  can  be  summarized  as  below  : 

1.  Computation  of  yt^  at  each  instant  tn,  using  the  opti¬ 
mal  linear  predictor  described  above, 

2.  Non  linear  optimization  of  the  cost  function  Jt^ , 

3.  Prediction  of  missed  samples  between  instants  tn  and 
tn-\-i  using  the  last  estimated  AR  parameters. 

At  step  2,  one  has  to  compute  the  gradient  of  the  cost  func- 

tion  This  is  subsequently  used  in  some  non  linear 

optimization  procedure  to  minimize  Jt^.  Formal  descrip¬ 
tion  of  the  algorithm  is  given  in  [6].  Details  of  the  gradient 
computation  can  be  found  in  [4]  [7]  and  is  not  given  here  but 
what  is  important  is  that  the  gradient  can  be  updated  recur¬ 
sively  at  each  instant  .  In  addition,  the  use  of  an  iterative 
optimisation  procedure  (descent  algorithms  such  as  :  gradi¬ 
ent  or  variable-metric  methods)  together  with  an  exponen¬ 
tial  weighting  factor  such  as  afford  the  possi¬ 

bility  of  operating  in  non-  stationary  environments. 

3.  Some  discussions  about  the  cost  function 

Eq.  3  shows  that  yt^  is  a  polynomial  function  of  6  and 
hence,  Jt^  is  not  a  quadratic  cost  function  as  it  may  be  su¬ 
perficially  expected,  it  is  rather  a  polynomial  whose  degree 
at  instant  in  depends  on  the  number  of  missed  observations 
until  tn .  This  may  cause  the  problem  of  convergence  to  a 
local  minimum  and  not  necessarily  to  the  global  one.  One 
solution  may  be  to  repeat  the  algorithm  with  several  initial 
values  to  increase  the  chance  of  finding  the  global  minimum. 
However,  the  cost  function  Jt^  has  some  interesting  proper¬ 
ties,  at  least  in  some  special  cases.  For  example,  the  follow¬ 
ing  proposition  has  been  proved  [4]. 

Proposition  Suppose  that  {yn  }  is  an  arbitrary  AR(  1)  pro- 
cess  with  parameter  9*.  Assume  that  the  random  pattern  of 
misses  is  a  Bernoulli-type  one  in  which  each  measurement 
has  a  peed  probability  g  =  1  -  p  of  being  missed  and  that 


the  misses  are  independent.  If  we  define  the  cost  function  as 
below : 

Jt„  =  E  (wt„e?J  =  E  -  ytn?)  W 

where  yt^  is  obtained  by  the  method  described  above  and  if 
we  set  yi  =  yi,  then  Jt^  is  a  convex  polynomial  of  degree 
n  —  1  with  the  minimum  at  6  =  9*, 

The  proposition  describes  the  statistical  behavior  of 
the  present  method  in  the  case  of  AR(1)  processes.  In 
several  examples  tested  in  the  case  of  AR(2)  processes, 
only  one  minimum  has  been  observed  for  the  cost  function. 
The  extension  of  the  proposition  to  AR(p)  processes  has  not 
yet  been  done.  However,  the  simulations,  partly  discussed 
in  the  following  section,  give  satisfying  results  in  the  cases 
where  the  AR  process  has  a  larger  order. 

4.  Simulations 

In  all  the  examples,  we  consider  the  random  Bernoulli 
pattern  of  misses  where  it  is  supposed  that  each  sample  has 
the  probability  q  =  I  —  pof  being  missed  and  the  misses 
are  independent. 

Example  1  In  this  example  we  illustrate  the  performance 
of  the  proposed  algoritm  in  spectral  reconstruction.  In  each 
case,  and  as  a  reference  for  comparison,  the  spectral  estimate 
obtained  by  a  classical  AR  estimator  in  the  case  where  none 
of  the  samples  is  missed,  is  also  given.  The  approach  used  in 
this  case  is  the  forward-backward  approach  where  the  sum 
of  least- squares  criterion  for  a  forwad  model  and  the  analo¬ 
gous  criterion  for  a  time-reversed  model  is  minimised  [8]. 

The  first  test  spectrum  is  a  two  peak  one.  It  is  supposed 
that  the  two  peaks  represent  the  sum  of  two  zero-mean  inde¬ 
pendent  signals.  Each  signal  is  obtained  by  filtering  white 
noise  by  a  first  order  Butterworth  filter.  The  experiment  is 
repeated  100  times,  each  time  using  an  independent  realisa¬ 
tion  of  the  test  signal.  The  average  spectral  estimate  is  ob¬ 
tained  by  computing  the  average  of  the  estimates  over  these 
100  independent  trials  of  the  experiment.  The  normalised 
frequencies  and  bandwidths  of  the  peaks  are  : 

/i  =  0.3,  A/i  =  0.005  /2  =  0.35,  A/2  =  0.005 

(5) 

The  probability  of  missing  each  sample  is  ^  =  0.4. 

There  is  approximately  15  dB  of  difference  between  the 
amplitudes  of  the  sharp  peaks.  Fig.1  shows  plots  of  the 
average  estimated  spectra  for  both  cases  :  with  and  without 
missed  samples.  We  note  the  correct  reconstruction  of  more 
informative  portions  of  the  spectrum. 

The  second  test  spectrum  is  that  of  the  vowel  “i”  (in 
French!)  spoken  by  a  male  speaker.  The  probability  of 
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Figure  1.  Spectral  reconstruction.  Original 
PSD;  _ ,  estimated  PSD:  q  =  0%  - 

.-,q=  40%:  20). 


FREQUENCY 


Figure  2.  Spectral  reconstruction.  Estimated 

PSD  :q  =  0%  : _ ,  q  ==  40%  : . ,  (M  =  20). 


missing  each  sample  is  g  =  0.4.  The  estimated  spectra  are 
shown  in  Fig. 2. 

Example  2  In  this  example  the  performance  of  the  proposed 
algorithm  in  the  non-stationary  environments  is  studied. 
In  order  to  test  the  parameter  tracking  capacity  of  the 
method,  we  have  considered  a  sinusoid  that  is  subjected  to 
an  abrupt  change  in  frequency  as  demonstrated  in  Fig. 3. 
The  period  of  sampling  is  T  =  1,  The  AR  model  order  is 
M  =  2.  Flg.4  shows  the  time-frequency  evolution  of  the 
spectral  estimates.  We  note  the  correct  estimation  of  the 
frequencies /i  =  0.3  and /2  =  0.1.  The  evolution  of  the 
AR  parameters  as  a  function  of  time  is  shown  in  Fig.5.  We 
note  that  the  choice  of  a  forgetting  factor  A  =  0.99  increases 
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Figure  3.  The  frequency  variation  of  the  test 
signal  in  example  2. 


Figure  4.  Time-frequency  evolution  of  the 
spectral  estimate  for  the  test  signal  in  exam¬ 
ple  2.  A  =  0.99,  ^  =  40%. 


considerably  the  tracking  capacity  of  the  algorithm. 

Eample  3  The  convergence  behavior  of  the  mean 
squared  prediction  error  for  different  values  of  q  is  illus¬ 
trated  here.  The  test  signal  is  an  AR(2)  pocess  with  the 
parameters  6^  ^  _  q  3  q  pjg  5  shows  the  average 

results  obtained  from  100  independent  realisations  of  the 
AR(2)  process.  It  is  important  to  note  that  the  speed  of 
convergence  is  the  same  for  different  values  of  q.  Clearly, 
the  residulal  error  is  greater  for  larger  values  of  q.  This  is 
obviously  because  of  the  accumulation  of  the  errors  due  to 
missed  sample  estimations. 

From  the  previous  and  numerous  other  examples,  the 
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Figure  5.  AR  estimation  for  the  test  signal  in 

example  2.  A  =  0.99,  q  =  0%: _ ,  A  =  0.99,  q  = 

40%: . ,A  =  l,q  =  Q%:  . 


Figure  6.  Convergence  behavior  of  the  mean 

squared  error.  (M  =  2),^  =  0%  :  _ = 

20%  ^  =  30%  : _ g  =  40%  : . 


following  points  are  noted : 

•  The  performance  of  the  AR  estimators  in  both  of  the 
cases  {with  and  without  missed  samples)  are  similar, 
particularly  in  the  more  informative  zones  of  the  spec¬ 
trum. 

♦  In  the  case  where  some  of  the  samples  are  lost,  a  resid¬ 
ual  noise  level  is  observed  in  the  spectral  estimates. 
This  becomes  more  pronounced  as  the  number  of  lost 
samples  increases.  The  level  is  situated  at  —40  dB  for 
the  signals  in  the  example  1 .  The  reason  is  obviously 
the  lack  of  information  from  the  signal. 


•  A  higher  order  AR  estimator  is  needed  to  resolve 
neighbouring  spectral  peaks  with  the  same  fidelity  as 
in  single-peak  cases.  For  peaks  with  greater  amplitude 
ratios,  higher  model  orders  should  be  used. 

•  In  the  case  of  spectra  with  larger  bandwidths,  one  must 
choose  larger  model  orders  in  order  to  have  spectral  es¬ 
timates  with  the  same  fidelity  as  in  the  case  of  sharp 
peak  spectra.  This  is  because  the  AR  models  are  less 
adapted  to  these  kinds  of  spectra  than  those  with  sharp 
peaks. 

5.  Conclusion 

We  presented  a  parametric  spectral  estimation  technique 
for  signals  with  incomplete  data  based  on  AR  modeling. 
The  method  is  adaptive  and  can  be  applied  to  non  stationary 
cases.  Both  regularly  and  randomly  missed  data  can  be  han¬ 
dled.  Simulation  results  show  the  high  performance  of  this 
method  even  in  the  cases  where  a  large  number  of  samples 
is  lost. 
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Abstract 

An  optimum  Block  Modified  Covariance  Algorithm 
is  developed  for  computing  time-varying  autoregressive 
(AR)  parameters.  The  method  presented  here  differs 
from  those  presented  previously  [3]  in  that  it  uses  op¬ 
timally  selected  time-varying  convergence  factors  such 
that  the  block  mean  square  error  is  minimized  from  one 
iteration  to  the  next.  In  particular ,  the  algorithm  devel¬ 
oped  here,  called  Block  Modified  Covariance  Algorithm 
with  individual  adaptation  of  parameters  (BMCAI)^ 
uses  individual  time-varying  convergence  factors  com¬ 
puted  using  modified  covariance  matrix  approximations 
along  with  the  Gauss-Seidel  method.  Even  though  the 
BMC  A I  is  gradient  based  it  retains  the  attractive  spec¬ 
tral  matching  properties  of  fixed-window  least  squares 
modified  covariance  algorithms  while  at  the  same  time 
providing  capabilities  for  time-varying  spectral  estima¬ 
tion. 


1.  Introduction 

This  paper  is  concerned  with  the  development  of  an 
efficient  algorithm  for  least-squares  forward-backward 
prediction  (FBP).  Unconstrained  FBP  requires  ma¬ 
trix  inversion  and  most  of  the  originally  proposed  al¬ 
gorithms  compute  AR  parameters  based  on  a  fixed- 
window  approach.  Marple  developed  a  fast  Cholesky 
algorithm  (FCA)  which  requires  O(p^)  operations  and 
more  recently  a  fast  QR  algorithm  (FQRA)  [1]  which 
was  shown  to  have  improved  numerical  behavior  rela¬ 
tive  to  the  FCA.  The  fast  inversion  algorithms  [1]  are 
order  recursive  and  operate  on  a  fixed  N-point  record, 
i.e.,  they  are  non-adaptive.  A  family  of  fixed-order 
sliding-window  block  gradient  algorithms  for  FBP, 
namely  the  block  modified  covariance  algorithms  (BM- 
CAs),  were  proposed  recently  by  Spanias  [3].  In  par¬ 


ticular,  the  BMC  A  worked  reasonably  well  in  a  series 
of  "benchmark”  simulations,  however  its  performance 
deteriorated  considerably  in  scenario  requiring  estima¬ 
tion  of  the  spectral  content  of  multiple  closely-spaced 
sinusoids.  This  is  mainly  because  the  BMC  A  uses  a 
single  convergence  factor  (or  step  size  pis)  which  does 
not  allow  for  fast  adaptation  in  cases  where  the  mod¬ 
ified  covariance  matrix  has  large  eigenvalue  dispaurity. 
In  this  paper,  we  concentrate  on  the  development  of 
multiple  convergence  factors  for  adapting  the  AR  pa¬ 
rameters.  The  use  of  multiple  convergence  factors  in 
adaptive  FBP  was  motivated  by  work  done  in  adaptive 
FIR  system  identification  by  Mikhael  et  al  [2].  The  dif¬ 
ference  between  the  algorithms  presented  in  this  paper 
and  those  presented  by  Mikhael  are:  a)  the  algorithms 
presented  here  are  intended  for  modified  covariance  lin¬ 
ear  prediction  in  which  the  structure  of  the  equations 
to  be  solved  is  distinctly  different  than  that  encoun¬ 
tered  in  FIR  system  identification,  b)  the  algorithms 
presented  are  studied  in  the  context  of  spectral  esti¬ 
mation  applications  and  deal  with  the  idiosyncrasies  of 
some  of  complex  spectral  estimation  examples  such  as 
multiple  closely  spaced  sinusoids,  and  c)  the  proposed 
methods  go  a  step  beyond  Mikhael ’s  work  in  the  sense 
that  the  computation  of  the  individual  is  done  ef¬ 
ficiently  using  fast  and  stable  Gauss-Seidel  numerical 
methods  tailored  specifically  to  deal  with  the  structure 
of  the  modified  covariance  equations.  The  latter  is  the 
most  important  contribution  of  the  paper  in  that  it 
provides  opportunities  for  reducing  the  complexity  of 
the  algorithms  by  using  approximates  of  the  modified 
covariance  matrix  while  maintaining  the  attractive  per¬ 
formance  characteristics  of  least  squares  MC  spectral 
estimators. 

The  rest  of  the  paper  is  organized  as  follows.  Section 
2  presents  the  BMCA  and  Section  3  describes  an  algo¬ 
rithm  that  uses  individual  step  sizes  for  adapting  the 
AR  parameters  (BMCAI).  An  efficient  Gauss-Seidel  it- 
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erative  procedure  for  computing  the  optimum  conver¬ 
gence  factors  is  also  presented  in  this  section.  Section 
4,  presents  simulations  using  the  BMCAI  and  Section 
5  gives  the  conclusions. 

2.  The  Block  MC  Algorithm 

In  this  section,  a  general  technique  for  formulating 
the  BMCAI  is  presented.  We  begin  by  defining  the 
following  parameters:  let  i  be  the  block  index,  p  the 
order  of  the  AR  model,  N  the  number  of  samples  for 
prediction,  2L  the  length  of  the  processed  block,  n  the 
time  index,  ak{i)  the  fc-th  adjustable  parameter  in  the 
i-th  block  {k  =  l,2,3,...,p),  x{n)  the  input  signal  for 
linear  prediction  (adaptive  filter),  e^(z)  the  ^-th  error 
signal  in  the  i-th  block  {i  =  1,2,  ...,2L),  and  S  the 
number  of  samples  per  block  shift. 

At  the  2-th  iteration,  the  objective  is  to  minimize 
the  cost  function  J(2-hl)  =  +  where 

the  2L  X  1  error  vector  efb{i)  is  given  by 


The  BMCA  uses  the  following  update  formula  a{i  + 
1)  =  a{i)  -  fJ,Vfb{i),  with  Vfb{i)  =  -^Xji,{i)efb{i). 
The  condition  for  convergence  of  the  algorithm  is  0  < 
p  <  2LfXmaxi  where  A^ox  is  the  largest  eigenvalue  of 

Eix^bmfbd))- 

3.  The  BMCAI 

In  this  section,  we  propose  the  use  of  individual  con¬ 
vergence  factors  that  are  optimally  chosen  to  adapt  in¬ 
dividual  filter  parameters.  The  step  sizes  are  updated 
at  each  block  iteration. 

3.1.  Problem  formulation 

We  now  consider  the  relation 

a{i-\-l)  =  a{i)-M{i)Vft^)  (7) 

to  update  the  parameters,  where  M{i)  is  a.  pxp  diag¬ 
onal  matrix  containing  the  p  convergence  factors,  i.e., 


Bfbii) 

=  [ef{iS  H-  p  -f  1).. .6/(25  +  N) 
eb{iS  +P+  1)... 6^(25  -hiV)]^ 

(1) 

and  Bfin)  and  Bbin)  are  the  forward  and  backward  pre¬ 
diction  errors 

Bfin)  = 

p 

^(’^)  “  °'kii)xin  -  k), 

k=l 

(2) 

Bbin)  = 

p 

x(n  -p)  -^2  afc(i)x(n  -p  +  k). 

(3) 

k=l 


Equations  (1),  (2)  and  (3)  can  be  written  block- wise 
as 

Bfbii)  -  x{i)  -  Xfbii)a{i)  (4) 

where  the  2jL  x  1  vector  x{i)  is  given  by 

x{i)  =  [a;/(i5  +  p+l) . Xf{iS  +  N) 

Xb{iS  +  l) . XbiiS  +  N-p)f  (5) 


and  the  2Lxp  matrix  X fb{i)  and  p  x  1  vector  a{i)  are 
defined  by 


Xfbii) 


a{i) 


x{iS  +  p) 
x{iS  +  p  +  1) 

x{iS  +  N-1) 
x{iS  +  2) 
x{iS  +  3) 

_  x{iS  +  N -p  +  1) 


x{iS  + 1) 

xliS  +  2) 

x{iS  +  N  —  p) 
x{iS  +  P+  1) 
x{iS  +  P  +  2) 

x{iS  +  N) 


M(i) 


Pp{i) 


(8) 


As  in  all  block  gradient  algorithms,  the  block  gradient 
vector  Vfb{i)  is  replaced  by  an  estimated  block  gradi¬ 
ent  vector  which  is  given  by 


V/6(i)  - 


1  dJji) 
L  da{i) 


-^Xj,{i)efb(i). 


(9) 


From  (7),  (8),  (9)  one  obtains  the  general  form  of  the 
parameter  updating  formula  in  matrix  vector  form  as: 

a{i  +  1)  =  a{i)  +  ^M{i)X^,(:i)efb(i).  (10) 


In  the  parameter  update  (10),  there  are  p  individual 
time-varying  convergence  factors,  pk{i)  {k  —  1,2,  ...,p). 
These  factors  are  chosen  at  each  iteration  i  so  as  to  min¬ 
imize  the  functional  +  To  this  end,  the  forward 
and  backward  errors  are  expanded  using  the  truncated 
Taylor  series 


Bfbii  -Hi)  =  Bfbii)  +  ^^^Aa(i) 

=  Bfbii)  -  Xfbii)iaii  -Hi)  -  a{i)) 

=  Bfbii)  -  ^Xfbii)Mii)Xj,ii)Bfbil) 

=  Bfbii)  -  Xfbii)Mii)q{i)  (11) 


with  qii)  =  ^X‘f,^ii)Bfbii)  —  -V/6(i).  Here  the  par- 
dBfbii)  . 


tial  derivative 
to  -Xfbii). 


daii) 


is  obtained  from  (4)  and  reduces 
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The  next  step  is  to  choose  M{i)  such  that  J(i  +  1) 
is  minimized.  This  is  done  by  setting 


dJ{i  +  1) 


0 


(12) 


dnk{i) 

for  fc  =  1,  This  leads  to  the  system  of  equations 

=  (13) 

for  A;  =  1,  or 

R{i)M{i)q{i)  =  q{i).  (14) 


Equivalently, 

M(i)q{i)  =  R-\{)q(i),  (15) 

Therefore  the  updating  formula  (7)  becomes 
a(i  +  l)  =  a(i) +  M(2)q(i)  =  a{i)  (16) 


The  last  equation  is  the  weight  update  equation  for  the 
BMCAI  with  individual  adaptation  of  parameters.  Its 
main  drawback  is  the  requirement  of  computing  the 
solution  of  a  system  of  equations  of  order  p.  The  asso¬ 
ciated  cost  can  become  intolerable  especially  for  high- 
order  prediction.  The  following  section  gives  an  ap¬ 
proach  which  can  be  used  to  approximate 
in  an  efficient  manner. 


3.2.  Implementation  via  a  Gauss-Seidel  It¬ 
eration 


The  matrix  inversion  for  computing  the  vector 
R^^{i)q{i)  in  (16)  can  be  avoided  altogether  by  solving 
the  system 

R{i)z{i)  =  q(i)  (17) 

for  z{i)  via  an  iterative  method  (which  only  requires 
matrix- vector  products),  then  updating 

a{i  -h  1)  =  a{i)  H-  z{i),  (18) 

More  precisely,  z{i)  is  replaced  by  z^^\i)  obtained  by 
applying  k  iterations 

z(*)(z)  =  z(*-i)(i)  +  Q-\i)  [q{i)  -  12(z)z('=-i)(i)) 

(19) 

starting  with  a  given  vector  z(°Ti).  Here  Q(i)  is  a 
matrix  approximating  R(i).  Since  the  system  (17)  is 
symmetric  and  generally  positive  definite,  for  efficiency 
we  will  only  consider  Gauss-Seidel  iterations,  i.e.. 


Algorithm 

Multiplies 

Additions 

BMCA 

4Lp  -f-p 

ALp 

BMCAI 

p(m  -  3p/2  -1-  3/2) 

p{AN  -  3p/2  -  1/2)  +  1 

Table  1.  Computational  Complexity  of  BMCA 
algorithms  L  =  N  -p. 


where  D{i)  and  L{i)  are  the  diagonal  and  (strictly) 
lower  triangular  parts  of  R{i),  respectively.  Note  that 
the  matrix  R{i)  is  not  always  diagonally  dominant  (at 
least  for  the  input  data  used),  which  explains  why  the 
Jacobi  method  (corresponding  to  Q(f)  =  D{i))  did 
not  converge  when  applied  to  (17).  In  our  experiments 
only  2  or  3  iterations  were  sufficient  to  obtain  a  good 
approximation  of  z(i)  when  starting  with  =  0. 

For  two  iterations,  this  is  equivalent  to  approximating 
z{i)  by 


z(2)(i)  =  (£>(*) +  L(i))-i. 

(q{i)  -  L^[i){D{i)  +  £(f))-iq(i))  .  (21) 

In  order  to  reduce  the  computational  complexity  of  the 
algorithm  the  sum  D{i)  +  L{i)  can  be  directly  updated 
without  forming  R{i  -{- 1),  by  considering  the  lower  tri¬ 
angular  part  (including  the  diagonal)  of  the  recursion 

R{i  +  1)  =  R{i)  +  V'^{i  +  l)W{i  +  1)  (22) 

i.e., 

Dii  +  1)  +  L{i  +  1)  =  D{i)  +  L(i)  +  Y(i  +  1)  (23) 


where  y  (i  +  1)  is  the  lower  triangular  part  of  V^(i  + 
l)Ty(i  +  1).  Note  that  with  V(i)  and  W(f)  defined  as 


xi{i) 

V{l)  = 

XL{i) 

^L+l  (0 

and  W{i)  = 

xiii) 

XL+l{i) 

X2L{i) 

X2L{i) 

(24) 

where  xe(i)  denotes  the  ^-th  row  of  Xftii)  {I  = 
1, ...,  2L  =  2{N  —  p)),  then  y (i  +  1)  can  be  computed 
efficiently. 

The  computational  complexity  of  the  BMCAI  rela¬ 
tive  to  that  of  the  BMCA  is  given  in  Table  1 

4.  Simulation  Results 

The  performance  of  the  BMCAI  is  examined  in 
terms  of  its  ability  to  resolve  closely-spaced  sinusoids 


Q{i)  =  D{i)  +  L{i), 


(20) 
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embedded  in  noise.  The  PSD  obtained  using  the  BM- 
CAI  compared  favorably  against  that  obtained  with 
the  BMC  A.  In  Fig.  1,  we  show  a  simulation  with  10 
closely-spaced  spectral  peaks  of  a  process  given  by 

10 

x{n)  =  Ai  cos(u;in)  +  W  (n)  (25) 

i-l 

for  n  =  with  Ai  =  O.li,  uJi  = 

and  Q  =  10“"^  (noise  variance).  Here  /«  =  64  is  the 
sampling  frequency  (in  Hertz)  and  W{n)  a  pseudo¬ 
random  white-noise  sequence.  The  prediction  order 
was  taken  to  be  equal  to  32.  The  plot  in  Figure  1 
are  formed  by  overlapping  the  spectra  obtained  using 
the  BMCAI  with  individual  adaptation  of  parameters 
based  on  Gauss-Seidel  iterations,  for  10  independent 
realizations.  Each  realization  is  a  100-sample  record  of 
the  above  input  time  series.  The  relative  phases  change 
randomly  from  realization  to  realization.  Note  that 
although  the  sinusoids  are  very  closely  spaced  in  fre¬ 
quency  and  the  available  data  records  are  quite  short, 
the  BMCAI  tracks  accurately  the  frequencies  (Fig.  la 
and  b)  without  missing  any  spectral  peak.  The  BMCA 
on  the  other  hand  (Fig.  Ic)  fails  to  resolve  one  of  the 
peaks. 

5.  Conclusions 

In  this  paper,  the  formulation  of  a  block  modified  co- 
variance  algorithm  with  individual  convergence  factors 
(BMCAI)  has  been  presented.  The  convergence  fac¬ 
tors  are  optimally  selected  to  minimize  the  combined 
forward-backward  squared  error  in  each  block.  The 
BMCAI  computes  the  individual  convergence  factors 
using  Gauss  Seidel  iterations.  The  BMCAI  has  been 
applied  in  AR  spectral  estimation  and  outperformed 
the  existing  BMCA. 
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Figure  1.  (a)  PSD  estimation  using  the  BM¬ 
CAI  based  on  Gauss-Seidel  iterations  with  10 
realizations  of  100-sample  records,  p=32  and 
SNR=42dB,  (b)  average  of  the  ten  simulations 
shown  In  (a),  and  (c)  PSD  using  the  BMCA 
with  the  same  record  and  prediction  order  but 
with  a  fixed  =  0.001. 
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Abstract 

Simultaneous  registration  and  tracking  has  advan¬ 
tages  over  other  track  registration  techniques  because  it 
is  capable  of  responding  to  changes  in  registration  er¬ 
rors.  The  track  registration  problem  is  presented  for 
a  network  of  two  geographically  distributed  radars  with 
unknown  measurement  biases  that  are  fixed  or  slowly 
varying.  The  extended  Kalman  filter  that  receives  un¬ 
registered  and  cluttered  plots  from  the  radars  and  out¬ 
puts  registered  tracks,  is  used  to  carry  out  centralized 
simultaneous  registration  and  tracking.  A  multisensor 
probablistic  data  association  filter  (PDAF)  that  com¬ 
bines  locally  gated  plots  from  the  radars  is  developed  to 
enable  the  system  operate  under  clutter.  The  algorithm 
satisfies  a  number  of  important  registration  design  cri¬ 
teria. 


1  Introduction 

In  multisensor  tracking,  registration  is  vital  if  er¬ 
rors  due  to  site  uncertainties,  antenna  orientation  and 
improper  caliberation  of  range  and  time  are  to  be  min¬ 
imized.  Errors  that  are  fixed  but  unknown  can  be  han¬ 
dled  as  part  of  a  multisensor  initialization  procedure 
and  a  suitable  off-line  approach  is  the  generalized  lin¬ 
ear  least-squares  estimation  (GLSE)  technique  [1,  pp. 

180],  [2,  pp.  68]. 

Unfortunately  sensor  measurement  biases  can  vary 
over  time  due  to  technical  maintenance  or  the  effect  of 
a  changing  wind  direction  on  the  mechanics  of  a  radar 
antenna  [3,  pp.  38].  This  requires  on-line  estimation 
of  biases  and  tracks  under  clutter  using  an  algorithm 
that  satisfies  some  basic  registration  design  criteria  [1, 
pp.  173]. 


In  this  paper  we  consider  a  system  of  two  2-D 
radar  detectors  A  and  B  located  at  (rjiXi)  and  (772)  C2) 
respectively  and  responsible  for  a  common  cluttered 
surveillance  region  that  is  being  traversed  by  a  single 
non-maneuvering  target  T.  We  assume  that  the  tar¬ 
get  position  at  time  index  k  with  respect  to  a  common 
Cartesian  coordinate  system  is  {xi(k),X2{k)). 

Furthermore  each  radar  measures  target  position  in 
polar  coordinates  with  the  origin  of  the  measurement 
system  being  located  at  the  radar  antenna.  We  there¬ 
fore  assume  that  the  target  as  reported  by  sensors  A 
and  B  are  at  T^(pi(^),  ^i(Ar))  and  TB{p2{k),02{k))  re¬ 
spectively  and  that  these  measurements  (or  plots)  in¬ 
clude  fixed  but  unknown  biases  6pi  and  60i  and  mea¬ 
surement  noise  Vi{k)  for  i  =  1,2.  The  measurement 
equations  for  the  two  radars  therefore  take  on  the  form 

m]  =  [^j:] (1) 

where  Vi(k),  i  =  1,2  are  respectively  zero-mean,  mu¬ 
tually  uncorrelated,  white  Gaussian  noise  processes  of 
known  covariance  Ri(k). 

With  the  bias  terms  unknown  it  is  not  possible  to 
determine  the  true  target  position.  We  must  therefore 
attempt  joint  estimation  of  the  target  state  and  biases 
using  target  measurements  from  the  two  sensors.  To 
do  this,  we  append  the  radar  biases  to  the  target  state 
to  obtain  the  augmented  vector 

x{k)  =  [xJik)  b'^ik)f  (2) 

where  xt(k)  =  [xRi)  xi{k)  X2{k)  X2{k)^  is  the  tar- 
get  state  and  b{k)  =  [Spi(k)  60i(k)  6p2(k)  S92{k)f' 
is  the  bias  vector.  The  resulting  process  equation  there¬ 
fore  takes  on  the  form 
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x{k  + 1)  = 


F  0 
0  I 


x{k)  + 


GtWt{k) 

u{k) 


(3) 


where  I  denotes  an  identity  matrix  of  dimension  4 
and  u{k)  is  a  small  process  noise  term.  The  equa¬ 
tion  can  be  more  compactly  written  as  x{k  +  1)  = 
F{k)x(k)+G{k)w(k),  where  F{k)  and  G(k)  are  known, 
and  w{k)  is  a  zero-mean,  white  Gaussian  noise  process 
with  covariance  Q{k).  For  centralized  tracking,  the 
combined  measurement  equation  takes  on  the  form 


’Piik)' 

P2ik) 

— 

hi{x{k)) 

_h2{x{k))_ 

-|-[0  /]a:(fe)-j- 

v\{k) 

V2ik) 

ie2(k)} 

which  may  be  rewritten  as  y{k)  =  h{x{k))  + 

t)(Jb)  where  /i(.)  is  known  but  in  general  nonlin¬ 
ear.  The  measurement  noise  covariance  is  R{k)  = 
block  -  dia.g{Ri(k),  /?2(fc))-  The  process  and  measure¬ 
ment  equations  in  (3)  and  (4)  are  in  the  form  required 
for  approximate  conditional  mean  estimation  by  (first 
order)  extended  Kalman  filtering. 


•  ^»o(^)  -  measurement  yj{k)  in  yi(fc)  is  a  target 
measurement,  all  other  measurements  in  Yi{k)  and 
Y^ik)  are  clutter,  i  =  1, 

•  Ooj{k)  -  measurement  y^{k)  in  Y2{k)  is  a  target 
measurement,  all  other  measurements  in  Yi{k)  and 
Y2{k)  are  clutter,  j  =  1, ..., 

•  Oij{k)  -  measurement  yj{k)  in  Yi(fc)  and  yj{k)  in 
Y2{k)  are  target  measurements,  all  other  measure¬ 
ments  in  Yi{k)  and  Y2{k)  are  clutter,  i  =  1, 

j  = 

We  therefore  have  a  total  of  m\m\  -t-  4-  -f-  1 

possible  association  hypotheses  each  of  which  has  an 
association  probability  defined  for  i  =  0,1,. and 
j  =  0,1,  ...,m|  by 

l3ijik)  =  Pv{0ij{k)\Y,\Y^^}  (5) 

where  denotes  the  cumulative  data  set  for  sensor  i. 

The  joint  registration  and  tracking  process  consists 
of  propagating  the  approximate  conditional  mean  of 
the  combined  target  and  sensor  bias  state  x{k  — 1|^  —  1) 
and  its  covariance  P{k  —  1| A:  —  1)  to  obtain  x(k\k  —  1) 
and  P{k\k  —  1)  from  which 


2  Cluttered  Environment 

We  now  extend  the  method  to  the  case  of  clut¬ 
tered  (false)  measurements  arising  from  two  separate 
sensors  that  are  tracking  a  single  target  through  a 
common  surveillance  region.  Denote  the  set  of  mea¬ 
surements  obtained  by  sensor  i  at  time  k  by  Yi{k)  — 
{yi(^),y2(^)»  heavily  cluttered  scenar¬ 

ios,  validation  gates  can  be  applied  to  reduce  the  num¬ 
ber  of  measurements  for  processing.  The  number  of 
validated  measurements  per  sensor  per  scan  m\  is  a 
random  variable.  In  addition  to  the  unknown  sensor 
biases,  there  is  uncertainty  as  to  which  measurement 
(if  any)  in  Yi{k)  corresponds  to  the  target  of  interest. 

Our  approach  to  the  problem  is  similar  to  that  pre¬ 
sented  in  [5]  where  a  method  was  developed  for  the 
fusion  of  multiple  measurements  arising  from  a  com¬ 
mon  target.  The  basis  of  the  approach  is  probablistic 
data  association  (PDA)  [4,  pp.l64].  This  is  a  subop- 
timal  state  estimation  scheme  which  approximates  the 
Gaussian  mixture  density  of  the  target  state  by  a  single 
Gaussian  PDF  at  each  processing  stage. 

The  set  of  mutually  exclusive  association  hypotheses 
for  the  procedure  follows: 

•  ^oo(^)  “  no  measurement  in  Yi(Ar)  or  Y2{k)  is  a 
target  measurement; 


yi{k\k-l)  =  Hiik)x(k\k  -  1) 

Si{k)  =  Hi{k)P{k\k-l)Hiikf  +  Ri 

where  i/i(/?)  is  the  Jacobian  of  h(.)  evaluated  at  the 
state  prediction  x{k\k  -  1)  and  Si{k)  is  the  covariance 
of  the  innovation  process  of  sensor  2  =  1,2.  The  valida¬ 
tion  gate  for  each  sensor  is  then  defined  by  an  ellipsoid 
centred  on  the  predicted  measurement  according  to 

:ej  <Ti},  (6) 


where 


^  =  (yj(fc)  -  y\k\k  -  l)fS-\k){yi{k)  -  y\k\k  -  1)), 

j  =:  2  =  1,2,  and  N  is  the  dimension  of 

2/j  (k).  The  measurements  are  assumed  Gaussian  and  so 
^  with  N  degrees  of  freedom.  The  threshold 

ji  is  therefore  choosen  from  a  x^(^)  probability  dis¬ 
tribution  according  to  Pr{^j  £  Ti}  ^  ^Gi  where  Pqi 
is  sufficiently  high.  Using  the  argument  in  [4,  pp.l57], 

and  the  condition  equation  (5) 

takes  on  the  form 


G Cij , 

for  2  =  1, and  j  =  1, ., 

Cbi, 

for  2  =  1, j  =  0 

(7) 

Gcj , 

for  2  =  0,  and  j  =  1, ..,  m|. 

Ga, 

for  2  =  0,  and  j  =  0 
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where  C  is  a  normalizing  constant  and 

a  =  (1  —  Pi)iP(3i)Ai(1  -  Po2-Pg2)A2 
J  _  Pdi(1  -  PD2PG2)h  ^-iv}(k)'^s-Uk)^}(k'\ 

‘  (27r)f|5i(fc)|5 

c-  =  (1  -  PDlPGl)>^lPD2_-l^^(k^'^s;'■(k^,,](k■) 

'  (27r)T|52WI^ 

C.-,-  =  _ PdiPd-2 _ 

(27r)^|5W|i 

The  vector  i^Kk)  is  the  innovation  at  sensor  1  based 
on  measurement  i,  i/j{k)  is  the  innovation  at  sensor  2 

based  on  measurement  j  and  J^ij{k)  =  [i^i{k)  i^j{k)]^ 

is  the  innovation  from  the  two  sensors  for  z  =  1, m^, 
j  —  Pj)i  (Poi)  is  the  detection  (gate)  prob¬ 

ability  of  sensor  z,  A,*  is  the  spatial  density  of  clutter 
measurements  for  sensor  i  =  1,2,  and  the  probability 
mass  function  of  the  number  of  clutter  measurements 
is  Poisson  for  each  sensor.  Having  obtained  (3ij{k)^  it  is 
now  possible  to  evaluate  the  conditional  state  estimate 
and  error  covariance. 

2.1  Conditional  Mean  and  Error  Covariance 

The  state  update  equation  of  the  PDAF  therefore 
takes  on  the  form  =  x(k\k  —  1)  +  W{k)i/(k) 

where  W{k)  is  the  filter  gain  and  z/(jb)  is  the  combined 

innovation  given  by  i/{k)  -  TT=o /^ijik)jyij{k)  or 

ErJi  I'm  [/?oi(i) + Eri 

The  error  covariance  associated  with  the  updated 
state  estimate  has  the  form 

P(^l^)  =  /3oo(W^I^-l)  +  «iWPi(^l^) 

4-  a2ik)P2(k\k)  +  ai2ik)Pi2{k\k)  +  P{k) 
Pi(k\k)  =  P{k\k-l)-Wi{k)Si{k)WT{k), 
Pi2ik\k)  =  Pik\k-l)-W(k)Sik)W^{k) 

where  W(k)  =  [Wi(k)  ^2(^)1  and  Wi{k)  is  the  gain 
corresponding  to  measurements  from  sensor  z  =  1,2. 
With  probability  ^oo(^)5  none  of  the  measurements  is 
correct  and  so  the  covariance  P{k\k  —  1)  indicating  no 
update,  appears  with  this  weighting.  Similarly,  with 

probability  ai(Ar)  =  A'o(^)j  target  measurement 

is  available  only  to  sensor  1  and  so  the  updated  covari¬ 
ance  Pi(Ar|Ar)  has  this  weighting.  Furthermore,  with 

2 

probability  a2{k)  =  l^ojik)  target  measurement 
is  available  only  to  sensor  2  and  so  the  updated  co- 
variance  ^2(^1^)  has  this  weighting.  Lastly  with  prob¬ 
ability  (Xi2(k)  =  (1  —  /3oo{k)  —  ai{k)  —  a2{k))  target 


measurement  is  available  to  both  sensor  1  and  2  and 
so  the  updated  covariance  Pi2{k\k)  appears  with  this 
weighting.  The  last  term  P  is  positive  semidefinite  [4, 
pp.  324],  and  represents  the  effect  of  the  measurement 
origin  uncertainty  since  we  do  not  know  which  of  the 
^2  +  '^k  validated  combinations  actually  rep¬ 

resents  the  target  measurement  combination.  The  fac¬ 
tors  Pij{k)  and  P{k)  are  measurement  dependent  and 
so  is  a  stochastic  Riccati  equation. 

3  Numerical  Results  and  Conclusions 

Figure  1  shows  a  network  of  radars  A  and  B  each 
with  fixed  but  unknown  range  and  azimuth  biases.  For 
a  target  originating  from  location  (250,95),  two  dis¬ 
tinct  plots  each  displaced  from  the  true  trajectory  are 
obtained.  Tracking  was  done  under  Gaussian  measure¬ 
ment  noise  and  a  clutter  density  that  generates  between 
0  and  6  validated  clutter  samples  per  sensor  during 
each  stage.  Figure  2  shows  the  composition  of  vali¬ 
dated  measurements.  Figures  3  and  4  show  bias  esti¬ 
mates  and  their  variances. 

The  registration  and  tracking  algorithm  is  robust 
under  clutter  condition  once  track  initiation  has  been 
properly  done.  It  can  track  fixed  or  slowly  varying 
registration  errors  under  clutter  conditions  and  is  based 
on  a  sound  mathematical  foundation.  Furthermore, 
it  provides  quality  estimates  for  the  solution  set  and 
can  be  adapted  to  cater  for  a  wide  range  of  system 
configurations. 
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Figure  1.  Trajectory  estimate  under  clutter 
with  Pdi  =  0.90,  Ri  =  diag(4  x  10“^,  4  x  10~^), 
Pai  -  0.989  and  A,-  =  0.0002,  i  =  1, 2. 
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Figure  3.  Estimates  of  radar  biases  during  the 
tracking  process. 


Sample  No.  Sample  No. 

Figure  2.  Variation  of  validated  clutter,  missed 

detections  and  ungated  detections  during  the  Figure  4.  Variances  of  range  and  azimuth  bias 

tracking  process.  estimates  during  the  tracking  process. 
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Abstract 

An  approach  to  array  processing  (i.e.,  direction- 
finding ,  signal  separation  and  reconstruction,  and  cali¬ 
bration)  based  on  the  Analytical  Constant  Modulus  Al¬ 
gorithm  is  considered.  The  main  advantage  of  this 
approach  is  that  the  multidimensional  search  associ¬ 
ated  with  Maximum  Likelihood  based  estimators  or  the 
single  dimensional  search  associated  with  MUSIC  type 
methods  are  eliminated. 

The  sensor  array  elements  are  assumed  to  have  the 
same,  up  to  a  multiplicative  constant,  angle  dependent, 
unirnown  gain  pattern.  We  show  that  under  this  as¬ 
sumption  it  is  possible  to  estimate  the  array  response 
matrix  and  then  use  the  result  for  direction  finding,  if 
the  nominal  array  manifold  is  known,  at  least  approx¬ 
imately.  It  is  also  possible  to  use  the  estimated  array 
response  matrix  in  order  to  separate  and  reconstruct 
the  signals,  or  calibrate  the  array  shape  or  response. 

1  Introduction 

In  recent  years  many  approaches  to  direction  finding 
were  proposed.  All  of  these  approaches  are  associated 
with  some  form  of  search.  Maximum  Likelihood  based 
techniques  like  the  EM  algorithm  [1],  IQML  [2],  APM 
[3],  MODE  [4]  and  others,  require  multidimensional 
search  in  the  parameter  space.  The  main  diflSculty  in 
using  these  approaches  is  that  the  algorithms  tend  to 
converge  to  a  local  stationary  point  and  convergence 
to  the  global  maximum  (or  minimum)  is  not  guaran¬ 
teed.  Even  MUSIC  [5]  type  algorithms  require  a  one¬ 
dimensional  search  which  is  indeed  free  from  conver¬ 
gence  problems  but  is  associated  with  a  lengthy  search 
procedure  (the  search  must  be  performed  on  a  fine 
grid  in  order  to  avoid  missing  the  narrow  peaks  of  the 
MUSIC  spectrum).  Search-free  techniques  like  Root- 
MUSIC  and  ESPRIT  [9]  require  special  array  configu¬ 
rations  that  limit  their  applicability. 

We  consider  an  approach  to  steering  vector  estima¬ 
tion  that  does  not  rely  on  a  search  procedure.  The 
steering  vectors  of  the  array  are  estimated  via  a  short 
non-iterative  algorithm.  The  estimates  are  close  to  the 
true  steering  vectors,  if  enough  data  samples  are  col¬ 
lected.  If  desired,  the  estimates  can  be  further  im¬ 
proved  using  a  few  iterations  that  are  guaranteed  to 
converge.  These  estimates  of  the  steering  vectors  can 
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be  used  for  direction  finding,  signal  separation  and  re¬ 
construction,  or  array  shape/phase  calibration.  The 
fact  that  the  algorithm  is  essentially  search  free  is  its 
most  appealing  feature. 

The  method  is  based  on  a  version  of  the  Analytical 
Constant  Modulus  Algorithm  that  was  recently  pro¬ 
posed  by  van  der  Veen  and  Paulraj  [6]  for  estimating 
constant  modulus  signals.  However,  the  new  approach 
is  not  limited  to  constant  modulus  signals  or  any  other 
specific  signals. 

2  Problem  Formulation 

We  begin  by  describing  the  data  model  for  the  ob¬ 
servation  of  narrowband  signals  by  an  array  of  sensors. 

We  consider  an  M-element  array  of  sensors  and  N 
narrowband  signal  sources,  and  define  the  M  x  1  vector 
Hn  to  be  the  complex  array  response  for  the  nth  source. 

The  outputs  of  the  M  array  elements  at  the  k—th 
sample  are  arranged  in  an  M  x  1  vector, 

jc{k)  =  As{k) u{k)  k=l,2--Ng;  (1) 

where  n{k)  is  the  noise  vector,  s{k)  is  the  signal  vector, 
and 

A  =  [ai,a2,---,ajv]  (2) 

Assuming  that  the  signal  vectors  s(Ar)  and  the  noise 
vectors  u(fc)  are  realizations  of  stationary,  zero  mean 
random  processes,  and  there  is  no  correlation  between 
the  noise  and  the  signals,  the  data  covariance  matrix 
is 

R=E{x(jt)x"(fc)}  =  AR,A"  +  »7l  (3) 

where  R,  is  the  signal  covariance  matrix  (a  non¬ 
singular  matrix)  and  7]1  is  the  noise  covariance  matrix. 

Given  V,  snapshots,  the  sample  covariance  is  given 
by 

1  N, 

R=  —  y^x(A;)x"(^)  (4) 

k=i 

We  assume  that  the  sensors  have  identical  gain  pat¬ 
tern,  up  to  a  multiplicative  scalar  factor.  We  therefore 
use  the  following  model  for  A. 

[A]m,n  =  (5) 

The  scalar  Qm  is  the  multiplicative  factor  of  the  mth 
sensor  gain  pattern,  and  the  constants  (f>m,n  are  the 
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unknown  sensor  phase  responses.  We  are  interested 
in  estimating  the  steering  vector  matrix,  A.  As  a  by 
product,  we  also  estimate  the  noise  variance,  rj.  Note 
that  the  observations,  namely  x(fc),  do  not  change  if 
A  is  right  multiplied  by  a  diagonal  matrix  while  s{k) 
is  left  multiplied  by  the  inverse  of  the  same  diagonal 
matrix.  This  means  that  the  steering  vectors  and  the 
signals  can  be  observed  (and  estimated)  only  up  to  a 
multiplicative  complex  scalar.  We  therefore  assume, 
without  loss  of  generality,  that  the  first  element  of  each 
steering  vector  is  one. 

Note  that  if  the  signals  are  uncorrelated  then  is 
diagonal.  In  this  case  the  scalars  gp  can  be  estimated 
by  observing  that  the  elements  on  the  diagonal  of  the 
data  covariance  are  given  by 

N 

Rm,m  ^  9m  ^  ][^g]n,n  “b  V  (^) 

n=l 

Under  the  assumption  that  gi  =  1  y/e  get 
N 

~  Ri,i  -  V  (7) 

n=l 


where  "  indicates  estimated  values. 

This  estimation  procedure  for  g^  does  not  hold  in 
the  general  case  of  correlated  signals.  We  therefore  re¬ 
sort  to  the  assumption  that  g^  is  given.  This  assump¬ 
tion  is  not  restrictive  in  most  direction  finding  appli¬ 
cations.  However,  it  might  be  somewhat  restrictive  in 
the  signal  estimation  problem  for  which  the  sensor  re¬ 
sponse  is  not  of  prime  concern. 

3  Steering  Vector  Estimation 

The  eigenvalue  decomposition  of  the  data  covariance 
is  given  by 

R=  AR,A"  +  7?I=U,A,Uf +  »7U„U^  (9) 

where  A,  =  diag{ Aj ,  *  •  ■ ,  A/^}  is  a  diagonal  matrix  con¬ 
taining  the  N  biggest  eigenvalues  in  decreasing  order, 
and  the  associated  eigenvectors  are  the  columns  of  the 
matrix  U^.  The  columns  of  Un  are  the  remaining 
M  —  N  eigenvectors,  associated  with  Ajv+i  ~ 

Am  =  Subtracting  77I  from  the  above  equation  we 
get 

AR,A^  =  U,r,Uf  (10) 

where  F,  =  diag{Ai  —??,•••,  ~  v}-  Hence, 

A  =  U,W  (11) 

where  W  is  a  weighting  matrix. 

Based  on  an  estimate  of  we  are  interested  in 
estimating  the  matrix  A,  under  the  constraint  that  the 
modulus  of  each  column  is  given  by  [1, 5(2  •  •  •  9mV  ^  This 


raises  the  question  of  how  many  vectors  of  this  form  are 
contained  in  the  subspace  spanned  by  the  columns  of 
A?  We  show  in  [8]  that  in  some  cases  there  are  more 
than  N  such  vectors  in  the  range  of  A,  and  therefore  the 
solution  of  (11)  is  not  unique.  However,  in  most  cases, 
there  are  exactly  N  vectors  with  the  given  modulus  in 
the  column  space  of  A.  The  later  is  assumed  in  sequel. 

The  eigen  decomposition  of  R  provides  the  estimate, 
U5,  of  U5.  Equation  (11)  indicates  that  minimizing  the 
distance  between  its  left  hand  side  and  the  estimate 
of  its  right  hand  side  corresponds  to  estimates  of  the 
steering  vectors. 

In  order  to  find  W  we  follow  the  steps  of  [6].  Any 
column  vector,  w,  in  W  must  satisfy  the  equations, 

w^UmU^w  =  £r^,  m  =  l,2,  (12) 

where  is  the  mth  row  vector  of  U, .  These  equations 
can  be  written  in  a  different  form  as, 

P(w<g)w*)  =  g  (13) 

where  the  mth  row  of  P  is  given  by  vec'^lu^n^m} 

(14) 

Define  the  Householder  matrix 

Q  =  I-2^,  q  =  g  +  ||g||ei  (15) 

q  q 

where  ei  is  a  vector  of  zeros  except  for  the  first  element 
which  is  one.  By  left  multiplying  (13)  by  Q  we  get 

QP(w  (g)  w*)  =  -||gl|ei  (16) 

Define  the  (M  -  1)  x  matrix  P  to  be  QP,  with  the 
first  row  deleted,  so  we  get 

P(w  (g)  w*)  =  0  (17) 

This  equation  indicates  that  w(g)w*  belongs  to  the  null 
space  of  P.  Note  that 

rank{P}  <  min{M  -  1,  iV^}  <  M  -  1  (18) 

Substituting  in 

dim{null{P}}  =  -  rank{P}  (19) 

We  get, 

dim{null{P}}  >N^-{M-l)  (20) 

In  order  that  the  null  space  null{P}  will  span  the 
space  of  all  the  N  vectors  w  (gi  w*  we  must  have 
dim{null{P}}  =  N.  Hence,  we  get  the  condition 


N  >N^  ■ 

-M  +  1 

(21) 

M>N^ 

-N  +  l 

(22) 

If  the  condition  (22)  is  met,  then  the  solutions  to  (17) 

span  the  null  space  of  P.  Assume  now  that  the  null{P} 
is  spanned  by  the  vectors  Each  of  these 

vectors  can  be  obtained  by  a  linear  combination  of  the 
solutions  of  (17).  Hence, 

N 

Yn  ='^0!jn{wj  ®Wj) ,  n  =  (23) 


4  Application  to  Direction  Finding 

Once  the  steering  vector  phases  have  been  esti¬ 
mated,  the  signal  directions  of  arrival  (DO As)  can  be 
easily  extracted.  If  the  array  response  is  close  to  the 
free  space  model  of  propagation,  then  the  phase  of  the 
TTith  element  of  the  nth  steering  vector  is  given  by 

^m,n  =  2ir{d3;^rn  sin  On  COS  -|-  dy^m  sin  On  sin 

-hdz,mCOS^n)  (31) 


where  are  complex  scalar  coefficients.  Performing 
the  inverse  vec  operation  on  (23)  we  get 

N 

vec-i{y„}  =  «in(w;wj)  =  W*A„W^  ,  (24) 

j  =  l 

Hence,  to  obtain  W  we  have  to  simultaneously  diag¬ 
onalize  the  matrices  vec  Hy„},  n  =  l,2,-  -,iV.  An 
algorithms  for  performing  this  task  can  be  found  in  [7]. 
Our  approach  is  to  simultaneously  diagonalize  only  two 
matrices.  This  can  be  done  by  solving  a  generalized 
eigenvalue  problem.  Therefore  we  define, 

N' 

Yi  =  ^vec-i{y„}  =  WAiW^  (25) 

n  =  l 

and 

N 

Y2=  Yl  vec-i{y„}  =  W*A2W^  (26) 

n=iVHl 

where  TV'  is  the  integer  part  of  TV/2.  The  eigen  vectors, 
Vj  that  satisfy 

Yiyj=Y2Vj^j  (27) 

can  be  arranged  in  a  matrix  whose  inverse  is  W^. 

In  order  to  obtain  the  final  estimate  of  A  we  mini¬ 
mize  the  cost  function 

/(U,,A)^||U,W-A||2.  (28) 

This  can  be  done  in  two  steps  that  can  be  repeated 

several  times.  Convergence  is  guaranteed  since  the  cost 
function  value  decrease  (or  stay  the  same)  in  each  step. 

1)  In  this  step  we  use  the  last  estimate  of  W  and 
find  A.  Since  the  modulus  of  the  elements  of  A 
are  known  we  only  have  to  find  the  phases  of  A 
which  minimize  the  cost  function.  Obviously,  the 
minimizing  phase  estimates  are  given  by 

phase{  Ai,- }  =  phase{[U,  }  (29) 

2)  In  this  step  we  use  the  last  estimate  of  A  to  esti¬ 
mate  W.  The  W  that  minimizes  the  cost  function 
is  given  by 

w  =  tjf  A  (30) 

Usually,  between  3  to  10  iterations  are  needed. 


where  dx.mj  dy,mj  dz^m  are  the  Cartesian  coordinates 
(in  wavelength  units)  of  the  mth  sensor,  while  On  and 
^n  are  the  elevation  angle  (with  respect  to  the  z  axis) 
and  the  azimuth  (with  respect  to  the  x  axis),  respec¬ 
tively,  of  the  nth  source.  Note  that  is  known  only 
modulo  2ir.  Hence,  phase  unwrapping  must  be  used 
before  applying  the  following  method. 

Define 


(32) 

(33) 

(34) 


H/in  =  <l>n  (35) 

Hence  we  have 

An  =  Ht0„  (36) 

where  fit  is  the  left  inverse  of  H.  The  estimates  of 
Onj  i)n  follow  immediately. 

If  the  array  response  is  given  by  a  calibration  table 
rather  than  an  analytic  expression  then  the  DOA  is 
estimated  by  finding  the  calibration  table  entries  that 
are  close  in  some  sense  to  the  estimated  steering  vector. 
Interpolation  is  usually  required  to  improve  the  system 
accuracy. 

5  N umerical  Examples 

Consider  a  linear  array  of  3  sensors  with  element 
spacing  of  half  a  wavelength.  The  sensor  gains  are 
chosen  arbitrarily  to  be  1.0,  1.02,  1.34.  The  array 
intercepts  two  equal  power  uncorrelated  signals  with 
Signal  to  Noise  Ratio  of  lOdB.  The  direction  of  ar¬ 
rival  (DOA)  of  one  signal  is  0®  relative  to  broadside 
while  the  DOA  of  the  other  signal  varies  from  5®  to 
30® .  The  algorithm  is  applied  to  simulated  data  matrix 
with  500  snapshots.  Figure  1  shows  the  experimental 
standard  deviation,  o,  and  experimental  bias,  ♦,  of  the 
steering  vector  phases,  vs.  the  DOA  separation.  Each 
circle/asterisk  is  based  on  200  experiments.  The  solid 
lines  represents  the  Cramer- Rao  bound  which  coincides 
with  the  theoretical  performance  analysis,  in  this  case. 
It  is  apparent  that  the  bias  is  negligible  and  the  stan¬ 
dard  deviation  agrees  with  the  theoretical  performance 
analysis  and  the  bound. 
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Next,  consider  a  linear  array  of  6  sensors  with  ele¬ 
ment  spacing  of  half  a  wavelength.  The  sensor  gains  are 
chosen  to  be  1.0,  1.01,  1.35,  1.47,  1.12,  1.09.  The  ar¬ 
ray  intercepts  two  correlated  equal  power  signals  with 
Signal  to  Noise  Ratio  of  lOdB.  The  magnitude  of  the 
correlation  coefficient  is  0.95  and  its  phase  varies  from 
0°  to  180“.  The  direction  of  arrival  (DO  A)  of  one  sig¬ 
nal  is  0“  relative  to  broadside  while  the  DOA  of  the 
other  signal  is  10“ .  The  algorithm  is  applied  to  simu¬ 
lated  data  matrix  with  500  snapshots.  Figure  2  shows 
the  experimental  standard  deviation,  and  experimental 
bias,  of  the  steering  vector  phases  vs.  the  correlation 
coefficient  phase.  Each  circle/asterisk  is  based  on  200 
experiments.  The  solid  line  represents  the  theoretical 
performance  analysis,  and  the  dashed  line  represents 
the  Cramer-Rao  bound.  We  note  that  the  experiments 
verify  the  theoretical  performance  analysis  and  that  the 
biais  is  negligible.  Observe  that  due  to  the  correlation 
between  the  signals  the  statistical  efficiency  is  lost. 

6  Conclusions 

We  examined  a  new  approach  to  direction-finding 
and  signal  estimation  beised  on  steering  vector  estima¬ 
tion.  We  showed  that  it  is  possible  to  estimate  the 
array  response  matrix  and  then  use  the  rrault  for  Direc¬ 
tion  Finding,  if  the  nominal  array  manifold  is  known, 
at  least  approximately.  It  is  also  possible  to  use  the  ar¬ 
ray  response  matrix  estimate  in  order  to  separate  and 
reconstruct  the  signals  or  calibrate  the  array.  The  main 
advantage  of  the  method  is  that  the  rnultidimensional 
search  associated  with  Maximum  Likelihood  based  esti¬ 
mators  or  the  single  dimensional  search  associated  with 
MUSIC  type  methods  is  eliminated.  The  method  can 
be  applied  in  the  presence  of  specular  multipath  (us¬ 
ing  spatial  smoothing)  but  it  is  not  suitable  for  signal 
separation  in  the  presence  of  diffuse  multipath. 
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Figure  2:  Standard  Deviation  and  bias  of  steering 
vector  phases  vs.  correlation  phase. 


Problem,”  Proceedings  of  ICASSP,  Detroit,  MI, 
pp.  1433-1437,  May  1995. 

[7]  A.  Bunse-Gerstner,  R.  Byers,  and  V.  Mehrmann, 
“Numerical  Methods  for  Simultaneous  Diagonaliza- 
tion,”  SIAM  J.  Matrix  AnaL  Apply  Vol.  4,  pp.  927- 
949,  1993. 

[8]  A.J.  Weiss,  and  B.  Friedlpder,  “Array  Processing 
Using  Joint  Diagonalization,”  European  J.  Signal 
Processing,  to  appear. 

[9]  R.  Roy,  el  a/.,  “ESPRIT  -  A  Subspace  Rotation 
approach  to  Estimation  of  Parameters  of  Cisoids,” 
IEEE  Trans,  on  ASSP,  Vol.  34,  pp.  1340-2,  1986. 


69 


JOINT  TARGET  ANGLE  AND  DOPPLER  ESTIMATION  WITH  FRACTIONAL 
LOWER-ORDER  STATISTICS  FOR  AIRBORNE  RADAR 


Panagiotis  Tsakalides  and  Chrysostomos  L.  Nikias 


Signal  &  Image  Processing  Institute 
Department  of  Electrical  Engineering  -  Systems 
University  of  Southern  California 
Los  Angeles,  CA  90089-2564 
e-mail;  tsakalid@sipi.usc.edu 


ABSTRACT 

We  introduce  a  new  joint  spatial-  ^lnd  doppler-frequency 
high-resolution  estimation  technique  based  on  the  fractional 
lower-order  statistics  of  the  measurements  of  a  radar  array. 
We  define  the  covariation  matrix  of  the  space-time  radar 
observation  vector  process  and  employ  subspace-based  es¬ 
timation  techniques  to  the  sample  covariation  matrix  re¬ 
sulting  in  improved  target  angle  and  Doppler  estimates  in 
the  presence  of  impulsive  interference.  We  name  the  intro¬ 
duced  technique  “2-D  Robust  Covariation-Based  MUSIC” 
or  “2-D  ROC-MUSIC”.  We  show  that  2-D  ROC-MUSIC 
provides  better  angle/DoppIer  estimates  than  2-D  MUSIC 
in  a  wide  range  of  impulsive  interference  environments  and 
for  very  low  signal- to-noise  ratios. 

1,  INTRODUCTION 

Most  of  the  theoretical  work  in  detection  and  estimation 
for  radar  applications  has  focused  on  the  case  where  clutter 
is  assumed  to  follow  the  Gaussian  model.  The  Gaussian 
assumption  is  frequently  motivated  by  the  physics  of  the 
problem  and  it  often  leads  to  mathematically  tractable  solu¬ 
tions.  However,  in  many  practical  instances,  experimental 
results  have  been  reported  where  clutter  returns  are  im¬ 
pulsive  in  nature  and  cannot  be  appropriately  modeled  by 
means  of  the  Gaussian  distribution  [1].  A  number  of  distri¬ 
butions,  based  on  empirical  as  well  as  theoretical  grounds, 
have  been  proposed  for  the  modeling  of  non- Gaussian  clut¬ 
ter  and  interference  environments  [2,  3]. 

Recently,  a  statistical  model  for  impulsive  clutter  has 
been  proposed,  which  is  based  on  the  theory  of  symmetric 
alpha-stable  (SaS)  random  processes  [4].  The  model  is  of  a 
statistical-physical  nature  and  has  been  shown  to  arise  un¬ 
der  very  general  assumptions  cind  to  describe  a  broad  clciss 
of  impulsive  interference.  In  particular,  it  has  been  shown 
in  [4]  that  the  first  order  distribution  of  the  amplitude  of 
the  radar  return  follows  a  SaS  law,  while  the  first-order 
joint  distribution  of  the  quadrature  components  of  the  en¬ 
velope  of  the  radar  return  follows  an  isotropic  stable  law.  In 

The  work  in  this  paper  was  supported  by  Rome  Laboratory 
under  Contract  F30602-95- 1-0001. 


addition,  the  theory  of  multivariate  sub- Gaussian  random 
processes  provides  an  elegant  and  mathematically  tractable 
framework  for  the  solution  of  the  detection  and  parameter 
estimation  problems  in  the  presence  of  impulsive  correlated 
radar  clutter. 

As  mentioned  in  [5],  much  of  the  work  reported  for  radar 
systems  has  concentrated  on  target  detection  in  Gaussian 
or  Non-Gaussicm  backgrounds  [6,  7,  8,  9],  In  this  paper, 
we  cire  addressing  the  parameter  estimation  problem  with 
a  space-time  adaptive  processing  (STAR)  radar  operating 
in  impulsive  clutter  and  interference  environments.  We 
present  a  new  subspace-based  method  for  joint  spatial-  and 
doppler-frequency  high-resolution  estimation  in  the  pres¬ 
ence  of  impulsive  noise  which  can  be  modeled  as  a  complex 
symmetric  alpha-stable  {SaS)  process.  In  Section  2,  we 
present  some  necessary  preliminaries  on  a-stable  processes. 
In  Section  3,  we  formulate  the  STAR  problem  for  airborne 
radar.  In  Section  4,  we  define  the  covariation  matrix  of 
the  space-time  radar  sensor  output  snapshot  and  we  show 
that  eigendecomposition-based  methods,  such  as  the  MU¬ 
SIC  algorithm,  can  be  applied  to  the  sample  covariation 
matrix  to  extract  the  angle/Doppler  information  from  the 
measurements.  Finally,  in  Section  5,  the  improved  perfor¬ 
mance  of  the  proposed  source  localization  method  in  the 
presence  of  a  wide  range  of  impulsive  noise  environments  is 
demonstrated  via  Monte  Carlo  experiments. 

2.  MATHEMATICAL  PRELIMINARIES 

In  this  section,  we  introduce  the  statistical  model  that  will 
be  used  to  describe  the  additive  noise.  The  model  is  based 
on  the  class  of  isotropic  SaS  distributions,  and  is  well- 
suited  for  describing  impulsive  noise  processes  [4]. 

Stable  processes  satisfy  the  stability  property  which 
states  that  linear  combinations  of  jointly  stable  variables 
are  indeed  stable.  They  arise  cis  limiting  processes  of  sums 
of  independent,  identically-distributed  rcindom  variables  via 
the  generalized  central  limit  theorem.  They  are  described 
by  their  characteristic  exponent  a,  tciking  values  0  <  a  <  2. 
Gaussian  processes  are  stable  processes  with  a  =  2.  Stable 
distributions  have  heavier  tails  than  the  normal  distribu¬ 
tion,  possess  finite  pth  order  moments  only  for  p  <  a,  cind 
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zu'e  appropriate  for  modeling  noise  with  outliers. 

A  complex  random  variable  (r.v.)  X  =  Xi  +  JX2  is 
isotropic  ScxS  if  Xi  and  X2  are  jointly  SaS  and  have  a 
symmetric  distribution.  The  characteristic  function  of  X  is 
given  by 

(^(w)  =  ^{exp(j9f^[wX*])}  =  exp(-7|wr),  (1) 

where  w  —  *1"  The  charoctcristic  exponent  cx  is  re¬ 

stricted  to  the  values  0  <  a  <  2  and  it  determines  the 
shape  of  the  distribution.  The  smaller  the  characteristic 
exponent  a,  the  heavier  the  tails  of  the  density.  The  dis¬ 
persion  7  (7  ^  0)  plays  a  role  analogous  to  the  role  that 
the  variance  plays  for  second-order  processes.  Namely,  it 
determines  the  spread  of  the  probability  density  function 
around  the  origin. 

Several  complex  r.v.’s  are  jointly  ScxS  if  their  real  and 
imaginary  pcirts  are  jointly  SaS.  When  X  and  Y  are  jointly 
SaS  with  1  <  cr  <  2,  the  covariation  of  X  and  Y  is  defined 

where  ■yy  =  [V,  Y]a  is  the  dispersion  of  the  r.v.  Y ,  and  we 
use  throughout  the  convention  Y^’^^  =  'V".  Also, 

the  covariation  coefficient  of  X  and  Y  is  defined  by 


Ax,y 


[X,YU 


and  by  using  (2),  it  can  be  expressed  as 


Ax,y  -  £;{(y|p}  ’ 


for  1  <  p  <  a. 


(3) 

(4) 


The  covariation  of  complex  jointly  SaS  r.v.’s  is  not  gener¬ 
ally  symmetric  and  hcis  the  following  properties: 

PI  If  Xu  X2  and  Y  are  jointly  SaS,  then  for  any  complex 
constants  a  and  6, 


[aXi  -h6X2,r]o  =  a[Xi,y]a+&[X2,y]a; 


P2  If  Yi  and  Y2  are  independent  and  Xi,  X2  and  Y  are 
jointly  SaS,  then  for  any  complex  constants  a,  h  and 

c, 


array  receives  signals  generated  by  q  narrow-band  mov¬ 
ing  targets  which  are  located  at  azimuth  angles  k  = 
and  have  relative  velocities  with  respect  to  the 
radar  {uk;  fc  =  1, . . . ,  g}  corresponding  to  Doppler  frequen¬ 
cies  {fk;  A;  =  1, . . . ,  g}.  Since  the  signals  are  nsurow-band, 
the  propagation  delay  across  the  array  is  much  smaller  than 
the  reciprocal  of  the  signal  bandwidth,  and  it  follows  that, 
by  using  a  complex  envelop  representation,  the  array  output 
can  be  expressed  as  [10]: 

x(t)=V(0,ca)s(O  +  n(O,  (5) 

where 

•  x{t)  =  [:ri(t), . . .  ,a;M7v(0]^  array  output  vec¬ 

tor  {N:  number  of  array  elements,  M:  number  of 
pulses,  t  may  refer  to  the  number  of  the  coherent  pro¬ 
cessing  intervals  (CPFs)  available  at  the  receiver); 

•  s{t)  =  [si{t), . . . ,  Sq(t)]^  is  the  signal  vector  emitted 
by  the  sources  as  received  at  the  reference  sensor  1  of 
the  cirray; 

•  V(©,t:7)  =  [v(i9i,cj7i),...,v(t9^,ttr^)]  is  the  space- 
time  steering  matrix  {zvk  =  '^)j 

•  Space-Time  steering  vector:  v(i9A:,tUfc)  =  b(t<7fc)  0 

-  ^{dk)  =  [1,  , . . . ,  is  the  spa¬ 

tial  steering  vector  =  ^cos(^fc)); 

-  b(cuA)  =  is  the  tem¬ 

poral  steering  vector. 

•  n(t)  =  [ni(t), . . . ,  nMN(t)]^  is  the  noise  vector. 

Assuming  the  availabihty  of  P  coherent  processing  in¬ 
tervals  (CPI’s)  ti, ...,  tp,  the  data  can  be  expressed  as 

X  =  V(0,  n7)S  +  N,  (6) 

where  X  and  N  are  the  MN  x  P  matrices 

X  =  [x(ti),...,x(tp)],  (7) 

N  =  [n(ti), . . .  ,n(tp)],  (8) 

and  S  is  the  g  X  P  matrix 


[aXijbYi  "b  — 

ab<—^>[Xu  Vila  4-  ac<“-'>[Xi,  Y2]a; 

P3  If  X  cind  Y  Eire  independent  SaS,  then  [X,Y]a  =  0. 


S  =  [s(ti s(tp)].  (9) 

Our  objective  is  to  jointly  estimate  the  directions-of-arrival 
{9k-,  k  =  and  the  Doppler  frequencies  [fk',  k  = 

Ij  •  -  • » g}  of  source  targets. 


3.  STAP  PROBLEM  FORMULATION 

Space- time  adaptive  processing  (STAP)  refers  to  multidi¬ 
mensional  adaptive  algorithms  that  simultaneously  combine 
the  signals  from  the  elements  of  an  array  antenna  and  the 
multiple  pulses  of  a  coherent  radar  waveform,  to  suppress 
interference  and  provide  target  detection  [10,  5,  11]* 

Consider  a  uniformly  spaced  linear  array  radar  antenna 
consisting  of  N  elements,  which  transmits  a  coherent  burst 
of  M  pulses  at  a  constant  pulse  repetition  frequency  (PRF) 
fr  and  over  a  certain  range  of  directions  of  interest.  The 


4.  THE  ARRAY  COVARIATION  MATRIX 

We  will  assume  that  the  g  signal  waveforms  are  non-coherent, 
statistically  independent,  complex  isotropic  5a5  (1  <  a  < 
2)  random  processes  with  zero  location  parameter  and  co¬ 
variation  matrix  Ps  =  diag(75i , . . . ,  75q)-  Also,  the  noise 
vector  n(fc)  is  a  complex  isotropic  SaS  random  process  with 
the  same  characteristic  exponent  a  as  the  signals.  The  noise 
is  assumed  to  be  independent  of  the  signals  with  covariation 
matrix  Tn  =  7nl- 
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Now,  we  define  the  covariation  matrix^  Fx,  of  the  obser¬ 
vation  vector  process  x(t)  as  the  matrix  whose  elements  are 
the  covariations  Xj{t)]ot  of  the  components  of  x(t). 

By  using  properties  P1-P3,  we  obtain  the  following  expres¬ 
sion  for  the  covariation  of  the  sensor  measurements: 

[x,-(t),a:j(t)]a  =  y^^Viit3k,TUk)vf’^~^^{'9k,T^k)ysi,  + 

k=l 

'TnSij  z,  i  =  1, . . . ,  MN.  (10) 

In  matrix  form,  (10)  gives  the  following  expression  for  the 
covariation  matrix  of  the  observation  vector: 

Tx  =  [x(0,  x(t)]a  =  v(0,  t^)r5V<"-‘>(0,  w)  +  7„I, 

(11) 

where  the  (z,  j)th  element  of  matrix  ^^(€),  uj)  results 

from  the  (j,  z)th  element  of  V(0,  zu)  according  to  the  oper¬ 
ation 

[V«^-‘>(0,  =  [V(0,  va)]<r^>  (12) 

Clearly,  when  a  =  2,  i.e.,  for  Gaussian  distributed  signals 
and  noise,  the  expression  for  the  covariation  matrix  is  iden¬ 
tical  to  the  well-known  expression  for  the  covariance  matrix: 

Rx  =  V(0,  zu)SV^(0,  zu)  +  (13) 

where  S  is  the  signal  covariance  matrix. 

When  the  amplitude  response  of  the  sensors  equals  unity, 
it  follows  that 


[v<--^>{e,zu)],,,  =  \vie,uu)]i,  (14) 

cind  thus  the  covariation  matrix  can  be  written  as 

Tx  =  V(0,t:r)rsV^(0,ti7)+7„I.  (15) 

Observing  (15),  we  conclude  that  standard  subspace 
techniques  can  be  applied  to  the  covariation  or  the  covaria¬ 
tion  coefficient  matrices  of  the  observation  vector  to  extract 
the  angle/Doppler  information.  In  practice,  we  have  to  es¬ 
timate  the  covariation  matrix  from  a  finite  number  of  array 
sensor  measurements.  A  proposed  estimator  for  the  co¬ 
variation  coefficient  is  called  the  fractional  lower 

order  (FLOM)  estimator  and  is  given  by  [12,  13] 


<P-i> 


(t) 


Er=i 


(16) 


for  some  0  <  p  <  «/2.  We  will  refer  to  the  new  algorithm 
resulting  from  the  eigendecomposition  of  the  array  covari¬ 
ation  coefficient  matrix  as  the  2-D  Robust  Covariation- 
Based  MUSIC  or  2-D  ROC-MUSIC. 


5,  EXPERIMENTAL  RESULTS 

In  this  section,  we  show  results  on  the  resolution  capabil¬ 
ity  and  estimation  accuracy  of  the  2-D  ROC- MUSIC  and 
2-D  MUSIC  methods.  The  array  is  linear  with  five  sensors 
spaced  a  half- wavelength  apart  (N  =  5).  The  number  of 
transmitted  pulses  is  M  —  10.  Three  moving  targets  im¬ 
pinge  on  the  array  from  directions  0  =  [—20°,  —40°,  40°] 
and  they  have  Doppler  values  D  =  [—0.3, —0.2, 0.3]).  The 


Figure  1:  2-D  MUSIC  and  2-D  ROC-MUSIC  angle-Doppler 
spectra  (AT  =  5,  M  =  10,  ©  =  [—20°,  —40°,  40°]  D  = 
[—0.3,  —0.2, 0.3]).  Additive  stable  noise  (a  =  1.5,  yn  =  4). 


number  of  snapshots  available  to  the  algorithms  is  P  = 
1000.  The  noise  follows  the  bivariate  isotropic  stable  distri¬ 
bution  with  a  —  1.5. 

Since  the  alpha-stable  family  for  a  <  2  determines  pro¬ 
cesses  with  infinite  variance,  we  define  an  alternative  signal- 
to-noise  ratio.  Namely,  we  define  the  Generalized  SNR 
(GSNR)  to  be  the  ratio  of  the  signal  power  over  the  noise 
dispersion  yn: 


M 

GSNR  =  W\og{^Y.\<^)\^)-  (17) 

t=x 

The  GSNR  is  22.3  dB  {yn  =  1).  The  characteristic,  expo¬ 
nent  a  of  the  additive  noise  is  unknown  to  the  ROC-MUSIC 
algorithm.  The  parameter  p  in  the  estimation  of  the  covari¬ 
ation  matrix  (cf.  (16)):  was  set  equal  to  p  =  0.8.  Cle£trly, 
MUSIC  can  be  thought  as  a  special  case  of  ROC-MUSIC 
with  p  =  2. 

In  Figime  1,  isosurfaces  of  space-time  spectr£il  estimates 
are  shown  for  the  2-D  ROC-MUSIC  and  the  2-D  MUSIC  al¬ 
gorithms.  We  can  see  that  the  2-D  MUSIC  method  exhibits 
poor  resolution  performance  and  it  does  not  resolve  the  two 
closely-spaced  moving  targets.  On  the  other  hand,  the  2-D 
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(a) 


Figure  2:  Probability  of  resolution  (a)  and  mean  square 
error  (b)  as  functions  of  the  source  angular  separation,  a  = 
1.5. 

ROC-MUSIC  method  exhibits  high-resolution  capabilities 
for  non- Gaussian  additive  noise  environments. 

Figure  2  illustrates  the  variation  of  the  algorithmic  per¬ 
formance  with  respect  to  the  spatial  angle  separation  of  the 
two  closely  spaced  incoming  targets  for  GSNR=  22.3  dB, 
{a  —  1.5).  As  expected,  the  resolution  capability  of  both  al¬ 
gorithms  improves  with  increased  angle  separation  between 
the  two  sources.  But  for  a  given  probability  of  resolution, 
the  2-D  ROC-MUSIC  algorithm  requires  a  lower  angle  sep¬ 
aration  threshold  than  the  2-D  MUSIC  algorithm. 

6.  CONCLUSIONS 

We  considered  the  problem  of  target  angle  and  Doppler 
estimation  with  an  airborne  radar  employing  space-time 
adaptive  processing.  We  introduced  a  new  joint  spatial- 
and  doppler- frequency  high-resolution  estimation  technique 
based  on  the  fractional  lower-order  statistics  of  the  mea¬ 
surements  of  a  radar  array.  We  showed  that  the  proposed 
2-D  ROC-MUSIC  algorithm  provides  better  angle/Doppler 
estimates  than  the  2-D  MUSIC  method,  and  it  can  result 


to  improved  STAP  radar  systems  operating  in  impulsive 
interference  environments. 
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Abstract 

Antenna  array  pattern  synthesis  deals  with  choosing  the 
complex  weights  of  an  antenna  array  in  order  to  satisfy  a  set 
of  specifications  or  to  say  if  such  a  set  is  feasible.  It  appears 
that  these  problems  can  often  be  expressed  as  convex  op¬ 
timization  problems  which  can  be  solved  numerically  with 
algorithms  such  as  interior  point  methods.  Two  examples 
are  given  dealing  with  constrained  adaptive  array  proces¬ 
sing  and  robustness  issues. 


1  Introduction 

It  is  well  known  that  the  antenna  pattern  of  a  linear  array 
in  direction  9  is  given  by  the  amplitude  of 

N 

GXi?)  =  (1) 

2=1 

where  the  complex  weights  Wi  are  the  parameters.  The 
position  of  the  elements  are  given  by  Xi  whereas  A  is  the 
wavelength.  The  pattern  is  easily  generalized  to  any  array 
geometry  with  two  angular  variables  (azimuth  and  eleva¬ 
tion). 

What  may  be  not  as  well  known  is  that  the  array  pattern 
is  a  convex  function  of  the  real  and  imaginary  parts  of  the 
weights.  This  important  property  makes  possible  the  solu¬ 
tion  of  many  antenna  array  synthesis  problems  using  convex 
optimization  and  more  particularly  recently  developed  algo¬ 
rithms  (interior  point  methods). 

This  is  all  the  more  interesting  as  many  other  problems 
arising  in  array  processing  are  convex.  For  instance,  the 
noise  or  signal  power  with  general  form  w'^  Rw  where  R  is 
some  covariance  matrix  and  denotes  the  conjugate  of  w 
are  convex  functions.  The  weight  level  ||iij||,  defined  as  a 
given  norm  of  the  weights  vector  is  another  convex  function. 
More  generally,  convex  quadratic  functions  appear  often  in 


antenna  array  design  and  we  see  in  the  next  section  how  they 
can  be  solved. 

2  Convex  optimization 

A  convex  optimization  problem  can  be  defined  as  the 
minimization  of  a  convex  function  over  a  convex  set.  The 
important  property  is  that  for  a  convex  problem,  any  local 
optimum  is  in  fact  global.  Furthermore,  by  using  optima¬ 
lity  conditions  or  more  generally  the  theory  of  duality,  it  is 
possible  to  obtain  lower  bounds  on  the  optimal  value  and  an 
absolute  required  precision  on  the  desired  results. 

It  is  impossible  here  to  describe  more  precisely  the  pro¬ 
perties  of  convex  optimization.  Let  us  simply  mention  the 
recent  book  by  Hiriart-Urruty  and  Lemarechal  [3].  Even 
more  interesting  is  the  development  of  very  efficient  algo¬ 
rithms  called  interior  point  methods.  The  book  by  Nesterov 
and  Nemirovsky  [7]  is  the  most  complete  account  on  the 
subject. 

Finally  the  article  of  S.  Boyd  and  L.  Vandenberghe  [9] 
shows  that  convex  optimization  is  of  much  interest  in  many 
engineering  fields.  Let  us  show  now  how  to  express  an 
antenna  array  pattern  synthesis  problem  as  a  convex  optimi¬ 
zation  one. 

3  Pattern  synthesis  as  a  convex  program 

In  general,  it  is  possible  to  design  optimal  antenna  array 
patterns  by  solving  particular  convex  optimization  problems 
of  the  general  form 

minimize  x 

subject  to  \\AiX-]-bi\^  <cjx  +  di,i- 

(2) 

where  Ai  G  R”"''",  bi  e  R",  Ci,e,x  G  R"  and  di  G  R. 
These  are  called  quadratically  constrained  convex  quadratic 
programs  (QCQP).  They  can  be  solved  with  an  algorithm 
described  in  appendix  A.  Let  us  notice  that  if  a  given 
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objective  f{x)  is  given,  it  is  easy  to  replace  its  minimization 
with  the  following  problem 

minimize  t  (3) 

subject  to  f{x)  <  t 

so  that  the  choice  of  a  linear  objective  is  general. 

Let  us  now  express  the  array  patern  as  a  quadratic  function 
in  order  to  recover  the  general  form  of  QCQP  problem  (2). 
Expression  (1)  is  a  linear  complex  function  of  the  weights 
so  that  its  amplitude  squared  is  a  quadratic  function  of  the 
real  and  imaginary  parts  of  the  complex  weights.  Generally 
a  normalization  constraint  G{0o)  =  1  is  used  so  that  it  is 
possible  to  eliminate  one  of  the  weights  as 

(N-\ 

i=\ 

Combining  the  quadratic  expression  of  the  beam  pattern  and 
the  elimination  leads  to  the  general  expression  \\AiX  +  6i|p 
for  a  given  \G{0i)\-  where  x  includes  the  real  and  imaginary 
parts  of  the  first  (N-1)  weights.  In  this  case,  m  =  2  and 
n  —  2(N-  1 ).  Therefore  if  we  want  to  constrain  the  beam 
pattern  in  L  different  directions  we  have  to  choose  Ci  ^  0 
and  the  constrained  level  di  for  i  =  \  to  L. 

A  similar  expression  can  be  derived  for  the  positive  po¬ 
wer  il^Rw  so  that  any  constrained  power  can  be  included 
in  problem  (2)  with  correspondent  c  =  0  and  d  giving  the 
power  constraint.  A  difference  is  that  here  m  N.  The  ob¬ 
jective  to  be  minimized  is  also  in  general  one  of  the  previous 
quadratic  expressions  ||Aoa:  so  that  we  can  replace 

it  with  the  minimization  of  ^  ==  x  with  the  constraint 
II  4-  bo\\^  <  In  this  case  co  =  e  and  do  =  0  using 
the  formulation  (3).  This  also  implies  that  x  is  modified  in 
order  to  add  the  new  variable  t,  so  that  =  2 —  1 .  Finally 
we  could  also  add  constraints  on  the  weights  norm  which 
can  be  very  interesting  for  the  signal  to  noise  ratio. 

We  want  to  show  now  through  some  simulation  examples, 
the  interesting  applications  of  interior  point  methods  to  an¬ 
tenna  array  processing. 

4  Simulation  Examples 

Applications  of  convex  optimization  to  antenna  array  pro- 
cessing  are  numerous  [6, 5].  We  want  to  show  here  two  kinds 
of  applications:  creating  broad  gaps  in  the  pattern  for  intefe- 
rence  rejection  with  applications  to  adaptive  beamforming 
and  designing  antenna  patterns  with  robustness  properties. 

4.1  Adaptive  arrays 

In  adaptive  arrays  problems,  it  is  generally  desired  to  put 
zeros  in  directions  corresponding  to  interferences.  Never- 
thess,  it  is  sometimes  more  efficient  to  create  a  broad  angular 
zone  where  the  pattern  is  minimum  even  if  not  zero. 


4.1.1  Constrained  pattern 

As  an  example,  we  deal  with  the  minimization  of  the  pattern 
level  around  70®  while  keeping  a  OdB  level  at  90®  for  a  32- 
element  linear  regular  array.  The  element  distance  is  half 
a  wavelength.  The  minimized  area  is  15®  wide  around  70® 
and  we  also  want  the  pattern  level  to  remain  less  than  -  12dB 
in  the  sidelobe  area  (except  of  course  the  minimized  region). 

The  problem  is  discretized  in  the  angular  directions  in 
order  to  be  expressed  with  the  general  form  (2).  The  figure  1 
presents  the  optimal  result.  Because  the  problem  is  convex, 
it  is  possible  to  state  that  within  the  required  precision  (which 
is  here  lO*"^),  it  is  impossible  to  find  weights  giving  a  better 
rejection  level  with  the  given  constraints.  It  is  also  possible 
to  compare  the  results  with  a  given  adaptive  technique. 

dB 


Figure  1.  Optimized  pattern  for  interference 
rejection  around  70  degrees.  The  straight 
line  is  the  solution  of  the  convex  probiem, 
whereas  the  dashed  line  gives  the  standard 
beamformer. 


4.1.2  Constrained  adaptive  beamforming 

We  can  now  have  a  slightly  different  approach.  Constrained 
adaptive  beamforming  is  an  important  issue  as  recent  articles 
show  it  [2, 8].  The  previous  simulation  could  be  criticized  as 
the  interference  position  needs  to  be  known.  Let  us  assume 
therefore  that  the  region  previously  mentionned  corresponds 
to  clutter  where  the  beam  pattern  level  has  to  be  less  than 
-40dB.  We  can  use  the  signal  covariance  matrix  R  as  in 
standard  array  processing.  For  the  simulation,  we  assume 
that  this  matrix  is  built  with 

♦  a  signal  of  interest  in  direction  90®  with  level  OdB, 

•  four  interferences  at  position  20,45,50,70®,  with 
identical  level  (60dB) 
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♦  a  white  noise  density  (-60dB). 

Figure  2  gives  the  result  of  standard  adaptive  beamforming 
with  the  same  array  as  above,  that  is  the  minimization  of 
Rw  subject  to  Ct(90^)  =  1.  The  four  interferences  are 
eliminated  as  expected.  The  new  problem  becomes  the  mi¬ 
nimization  of  the  signal  power  R,w  with  constraints  on 
the  clutter  zone  (less  than  -40dB),  the  mainlobe  zone  (less 
than  0.08  dB)  and  the  sidelobe  zone  (less  than  -12dB)  and 
with  a  normalization  constraint  of  OdB  at  90®.  Figure  3  gives 
the  beam  pattern  for  the  constrained  adaptive  beamforming 
problem.  The  interferences  are  once  again  cancelled,  fur¬ 
thermore  the  constraints  on  the  clutter  zone  are  achieved. 

>113 


Figure  2.  Interference  rejection  through  stan¬ 
dard  adaptive  beamforming. 


Figure  3.  Interference  rejection  through  cons¬ 
trained  adaptive  beamfroming. 


4.2  Robustness  issues 

The  problem  of  robustness  is  particularly  important  for 
antenna  array  design.  We  will  very  quickly  show  here  some 
results.  More  details  can  be  found  in  [4].  The  main  ideas 
come  here  from  a  series  of  papers  by  Evans  [10]  and  Can¬ 
ton!  [1]  .  Here  we  are  just  interested  in  the  robustness  of 
the  weights  themselves.  More  precisely,  it  is  known  that  the 
optimal  weights  have  to  be  discretized  for  implementation. 
What  is  the  influence  of  the  quantization  on  the  optimal 
results? 

These  problems  can  also  be  expressed  as  convex  opti¬ 
mization  problems  and  figure  4  shows  such  an  example. 
The  figure  shows  the  optimal  sidelobe  level  obtained  with 
quantization  steps  Aw  smaller  than  6.10~^.  This  means 
that  the  difference  between  any  quantized  weight  and  the 
corresponding  optimal  weight  is  less  than  Aw  in  modulus. 
The  straight  line  corresponds  to  a  10-element  array  with  a 
mainlobe  width  of  25“  whereas  the  dashed  line  is  for  a  30- 
element  array  with  a  mainlobe  width  of  25“.  For  both  cases, 
the  mainlobe  direction  is  45“ .  The  optimal  sidelobe  level  is 
obviously  increasing  with  the  quantization  step. 


Figure  4.  Weight  robustness:  optimal  side¬ 
lobe  level  vs.  quantization  step 


5  Conclusion 

Through  two  examples,  we  tried  to  show  the  advantages 
of  convex  optimization.  Of  course  all  optimization  problems 
are  not  convex,  but  it  is  of  much  interest  to  recognize  convex 
ones  and  use  their  properties.  From  our  point  of  view,  the 
main  one  is  to  be  able  to  state  if  a  problem  can  be  solved  or 
not,  and  if  it  can  be,  to  say  with  an  absolute  precision  what 
the  optimal  result  is.  Another  important  question  is  the 
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real-time  capabilities  of  such  algorithms  which  are  already 
very  efficient.  The  advance  of  digital  computers  will  give 
answers  although  the  problem  remains  opened. 

A  An  interior  point  algorithm 

The  algorithm  used  to  solve  problem  (2)  minimizes  the 
function 

<j)(x,l)  =  f/log(e^a;  - /) 

+  ^  log  ((cf  or  +  di)  -  IIA-x  +  6.11')  (4) 

i=\ 

where  /  is  a  lower  bound  on  the  objective,  through 

X]  ^  initial  feasible  point ; 
l\  initial  lower  bound; 

^  ^0; 
repeat  { 
k  1  i 

minimize  through: 
y\  ^  Xk\i  4-  0; 
repeat  { 
i  -j-  1  4—  r, 

compute  Vi,  Newton  direction  of  <i){yi) 
then  yij^\  ^  j/i  +  ociVi  with: 
ai  =  argmin^(/j(?/f  +  avi)  ; 

}  until  convergence; 

^k+\  4-  yi-f  1 ; 

compute  a  new  lower  bound  h+i ; 

}  until  e^Xk+\  —  4+i  <  tol; 

Let  us  add  that  ai  is  computed  through  a  line  search  and 
the  updated  lower  bound  Ik  is  obtained  thanks  to  optimality 
conditions  of  the  minimized  function  <j). 
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Abstract 

Recursive  methods  for  subspace  tracking  with  appli¬ 
cations  to  ^on-line  ^  direction  of  arrival  estimation,  have 
lately  drawn  considerable  interest.  In  this  paper,  In¬ 
strumental  Variable  (IV)  generalizations  of  the  Pro¬ 
jection  Approximation  Subspace  Tracking  (PAST)  al¬ 
gorithm  are  proposed.  The  IV-approach  is  motivated 
by  the  fact  that  PAST  delivers  biased  estimates  when 
the  noise  vectors  are  not  spatially  white.  The  result¬ 
ing  basic  IV- algorithm  has  a  computational  complexity 
of  Smn  +  0{n^)  complex  multiplications,  where  m  is 
the  dimension  of  the  measurement  vector  and  n  is  the 
subspace  dimension.  The  performance  of  the  proposed 
algorithms  in  tracking  sinusoids  in  colored  noise  is  il¬ 
lustrated  by  computer  simulations. 


1  Introduction 

One  aspect  of  the  sensor  array  signal  processing 
field  that  has  drawn  much  attention  is  the  applica¬ 
tion  of  high-resolution  frequency  and  direction  of  ar¬ 
rival  (DOA)  estimation  techniques  to  non-stationary 
environments,  see  for  example  [1].  A  drawback  of  tra¬ 
ditional  subspace  methods,  in  this  scenario,  is  that  the 
singular  value  decomposition  (SVD)  is  time  consum¬ 
ing  to  update.  A  specific  example  of  a  successful  sub¬ 
space  tracking  algorithm  is  the  Projection  Approxima¬ 
tion  Subspace  Tracking  (PAST)  algorithm  [5].  The  ba¬ 
sic  idea  of  PAST  is  that  a  projection  like  unconstrained 
criterion  is  approximated,  which  leads  to  a  RLS-like  al¬ 
gorithm  for  tracking  the  signal  subspace.  The  DOA  (or 
frequency)  estimates  can  then  be  taken  as  the  angles 

•This  work  was  supported  in  part  by  the  Swedish  Research 
Council  for  Engineering  Sciences  (TFR). 


of  the  eigenvalues  of  a  matrix  obtained  using  the  shift- 
invariant  structure  of  the  subspace  (Uniform  Linear  Ar¬ 
ray,  ULA).  However,  PAST  assumes  that  the  noise  is 
spatially  white,  and  tends  to  deliver  biased  estimates 
whenever  this  requirement  is  not  fulfilled.  This  fact  is 
the  motivation  of  the  algorithms  proposed  herein.  The 
aim  of  this  paper  is  to  present  Instrumental  Variable 
(IV)  generalizations  of  PAST.  For  a  treatment  of  IV 
methods  in  the  context  of  identifying  linear  systems, 
see  [3].  Like  all  other  IV-methods  we  require  that  an 
IV-vector,  that  is  uncorrelated  with  the  noise  vector, 
can  be  found.  As  long  as  this  requirement  is  fulfilled, 
the  noise  vectors  can  be  allowed  to  have  arbitrary  (tem¬ 
poral  and  spatial)  color.  A  certain  rank  condition  must 
also  be  fulfilled.  One  possible  approach  to  find  the  in¬ 
struments  is  to  consider  an  array  that  is  divided  into 
sub-arrays.  Then  the  outputs  of  one  of  the  sub-arrays 
can  be  taken  as  instruments.  Then,  if  the  sub-arrays 
are  sufficiently  far  apart,  the  noise  in  the  main  sub¬ 
array  is  uncorrelated  with  the  IV  vector.  For  a  discus¬ 
sion  on  temporal  IV’s,  see  [4].  In  the  following,  7^(A) 
denotes  the  subspace  spanned  by  the  columns  of  A  and 
p{A)  denotes  the  rank  of  A. 

2  Problem  Formulation 

Let  z(^)  G  be  the  observed  data  vector.  In  the 
array  case,  z(^)  consists  of  the  samples  of  an  array  with 
m  sensors.  In  time  series  (sum  of  complex  sinusoids) 
problems,  z(t)  =  [z{t), ...  ,z(t  +  m-  1)]^  consists  of 
m  consecutive  samples  of  an  observed  scalar  signal.  It 
is  assumed  that  z{t)  consists  of  n  narrow-band  plane 
waves  impinging  on  an  antenna  array  or  n  complex  si¬ 
nusoids  corrupted  by  additive  noise.  Here  the  subspace 
dimension  n,  n  <  m,  is  assumed  to  be  known.  Hence, 
the  following  data  model  will  be  studied,  see  for  exam- 
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pie  [5]: 


z{t)  =  rx(f)  +  e{t)  (1) 

where  e{t)  is  additive  noise  with  arbitrary  covariance 
matrix  Ce  =  E[e{t)e^ {t)].  The  structure  of  T  may 
generally  be  arbitrary,  but  in  this  paper  we  focus  on 
the  special  case  of  a  ULA.  The  matrix  T  is  deter¬ 
ministic  and  is  constructed  as  T  =  [7(^1)  •  •  •7(‘*^n)] 
where  7(wfc)  =  [1  is  a  so-called 

steering- vector.  Implicit  in  the  definitions  above  is  that 
the  subspace  7l(r)  might  be  slowly  time- varying,  i.e. 

n{T) = mm). 

With  samples  =  1,...,  we  are  interested  m 

deriving  an  efficient  algorithm  which  estimates  72.(r) 
at  time  instant  t,  given  the  subspace  estimate  at  time 
instant  t  -  1  and  the  sample  z(t).  Typical  for  IV  ap¬ 
proaches  are  the  following.  Assume  that  there  exists 
an  IV  vector  ^(t)  €  l>n  such  that 

Al:  £;[e(t)^"(f)]  =  0 

A2:  (t)])  =  =  n 

Assumption  A2  is  made  in  order  to  ensure  that 
p(rCa;j)  =  n,  which  implies  that  7?.(rCj;{)  =  7l(r). 
For  the  time  series  case,  this  assumption  is  discussed 
in  [2].  Assumption  A2  is  not  necessary  for  guarantee¬ 
ing  DOA  identifiability,  see  [4].  However,  for  reasons  of 
simplicity,  it  is  assumed  to  hold  throughout  the  paper. 

3  Basic  IV  algorithm  (IV-PAST) 

Consider  the  solutions  to  (W  6 

V(W)  =  (2) 

£;[z(t)^"(t)]  -  WW^E[z{t)$^it)]  =  0 

v(W)  =  rc»5  -  ww^rC:,^  =  o. 

Provided  A2  is  fulfilled,  by  definition  of  the  orthogonal 
projector,  all  solutions  to  (2)  will  be  of  the  form  W  = 
UrT  where  7^(Ur)  =  7^(r),  Ur  G  is  orthogonal, 
and  T  €  C"^”  is  an  arbitrary  unitary  matrix.  Thus, 
for  all  solutions  to  (2)  we  have  that 

ww”  =  Hr  =  r  (r^r) =  UrU?  (3) 


where  7  is  the  forgetting  factor  (0  <  7  <  1).  Using  the 

projection  approximation  idea  of  [5],  h.{k)  =  W^(fc  - 
l)z(A:),  then  gives 

W(t)  =  C,^(t)Cft5(t)  (5) 


with  obvious  definitions  of  the  estimates  of  the  covari¬ 
ance  matrices.  Using  the  matrix  inversion  lemma,  the 
following  algorithm  is  obtained: 


h{t) 

e{t) 

W(t) 

P(t) 

K{t) 


W^(t  -  l)z(t)  (6a) 

z{t)  -  W(t  -  l)h(t)  (6b) 

W(t  -  1)  -I-  e(t)K(t)  (6c) 

-  (P(t  -  1)  -  P(t  -  l)h(t)K(t))  (6d) 
7 


$"(t)p(t-i) 

7-h^"(t)P(t-l)h(t) 


(6e) 


where  P(t)  =  In  the  above  we  have  assumed 

that  initial  values  W(0),P(0)  are  given.  These  initial 
values  only  affect  the  transient  behavior  and  are  not 
important  for  the  steady-state  performance  of  the  algo¬ 
rithm.  They  can  for  example  be  taken  as  any  full-rank 
matrices. 

Due  to  the  introduced  approximations,  the  columns 
of  W(t)  will  not  be  orthonormal.  However,  simulations 
show  that  they  are  ’nearly’  orthonormal.  Some  appli¬ 
cations  may  require  orthonormal  columns,  which  may 
call  for  a  reorthogonalization  scheme  such  as  Gram- 
Schmidt.  However,  in  our  simulations  no  orthogonal- 
ization  is  performed. 

Note  that  we  have  constrained  the  dimension  of  the 
IV-vector  ^{t)  to  I  =  n,  which  implies  that  no  rank- 
reduction  of  the  sample  cross  covariance  matrix  is  per¬ 
formed.  So,  why  not  take  W(t)  =  Ci{(t)?  The  main 

motivation  is  that  the  matrix  (t)  post-multiplying 
in  (5)  forces  the  columns  of  W(t)  to  be  ’nearly’  or¬ 
thonormal,  resulting  in  good  conditioning.  Thus,  IV- 
PAST  can  be  thought  of  as  a  simple  way  to  approxi¬ 
mately  orthogonolize  the  columns  of  C2^(t).  The  ba¬ 
sic  IV-algorithm  will  also  serve  as  a  preview  of  a  more 
general  rank-reducing  IV-approach  described  in  the  fol¬ 
lowing  section. 


is  the  orthogonal  projector  onto  the  space  spanned  by 
the  columns  of  F.  To  derive  a  practical  algorithm,  con¬ 
sider  the  solutions  to  (compare  with  (2)) 

V(yV{t))  =  7*"*  (z(fc)^"(fc)-  (4) 

*=1 

-W(t)W"(t)z(A:)^^(A:))  =  0, 


4  Extended  IV-algorithm  (EIV-PAST) 


A  straightforward  extension  of  the  previous  discus¬ 
sion,  I  >  n,  leads  to  the  following  criterion 


V(VF(t))  = 


2 

F 


(7) 
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where  W{t)  €  C,^{t)  e  This  approach 

corresponds  to  what  in  [3,  Section  8]  is  called  the  Ex¬ 
tended  IV  estimate.  Without  loss  of  generality  we  as¬ 
sume  that  /:>(W(^))  =  n.  With  probability  1  (w.p.l), 
=  min(m,/)  =  n,  but  p{Cz{)  —  n<L  Con¬ 
sequently,  a  low-rank  approximation  of  Cz^it)  is  de¬ 
sired.  Thus,  the  following  theorem  is  needed. 

Theorem  1  Let  Cz^it)  have  the  SVD 


=  usv"  = 


U,  u„ 


Ss  0 

no  tn\ 


r  1 

-  H 

V 

J 


(8) 


where  Ug  €  The  remaining  partitions  are  of  ap¬ 

propriate  dimensions,  W{t)  is  a  stationary  point  of  (7) 
W(t)  =  UT,  where  tJ  denotes  any  n  left  singular 
vectors  o/U  and  T  €  denotes  an  arbitrary  uni¬ 

tary  matrix.  All  stationary  points  ofV{yf{t))  are  sad¬ 
dle  points  except  when  U  =  U^.  In  this  case  V'(W(t)) 
attains  the  global  minimum.  Note  that  for  this  choice j 

^{t)W^ {t)Cz^{t)  =  which  in  the  sense  of 

the  Frobenius  norm  is  the  best  possible  rank  n  approx¬ 
imation  of  Cz^  {t) . 

Proof:  See  [2],  □ 

Once  again  the  projection  approximation  is  applied: 


=  W"(t) 


(9) 


«  ^  tndt) 

k=l  '  ' 

b(k) 

which  gives  the  (quadratic)  criterion 


Figure  1 .  One  realization  of  the  frequency  esti¬ 
mates.  SNR=:5  dB,  e(t)  =  r-Og-i  7  =  0*97 


5  Examples 

Consider  the  scalar  signal 
2 

==  X)  ^3  cos  (27r/j  (t)t  +  (fj)  -h  e{t)  (12) 

j=\ 

where  ai  =  02  =  y/2.  The  random  phases 
are  independent  and  uniformly  distributed  in  (— 7r,7r). 
Thus,  n  =  4.  Chose  m  =  8  which  gives  z{t)  = 
[z{t), ...  ,z(tA-  7)]^.  The  (temporal)  IV-vector  is  cho¬ 
sen  as  ^{t)  =  [z{t  -  M), ...  ,z{t  -  M  -  I  +  1)]^  with 
M  =  11.  The  number  of  instruments  I  is  for  IV-PAST 
Z  =  4,  and  for  EIV-PAST  I  =  m  =  8.  The  frequencies 
are  estimated  using  the  ESPRIT-approach,  i.e.  the 
angles  of  the  eigenvalues  of  W|.^(t)Wi:rn-i(t),  where 
Wi:j  denotes  rows  i  to  j  of  W.  For  all  algorithms,  the 
following  initial  values  were  used: 


V(W{t))  =  ||c,fW  -  W(<)Ch«(t)||^ .  (10) 


The  minimizing  argument  of  (10)  is  given  by 


Wit)  =  (f)c;^(t) 


(11) 


where  (.)^  denotes  the  Moore-Penrose  pseudo-inverse. 
This  approach  will  in  most  cases  improve  the  accu¬ 
racy  of  the  estimates.  For  example,  in  Section  5  we 
will  see  that  the  tracking  capabilities  are  much  more 
’well-behaved’  in  this  case.  In  Appendix  A  an  efficient 
{8ml  4-  0{mn)  complex  multiplications)  recursive  up¬ 
dating  formula  of  (11)  is  given.  Note  that  the  matrix 
inversion  that  arises  in  (14d)  is  of  size  (2x2),  so  it  is 
a  simple  matter  to  invert  it. 


P(0)  =  I„  W(0)  =  [I„0f„_„),„F.  (13) 

However,  the  transient  is  typically  not  shown.  In  the 
simulations,  e(t)  =  — where  q~^  is  the  de¬ 
lay  operator  and  £{t)  is  white  Gaussian  noise.  Note 
that  for  this  noise  process,  condition  A1  is  violated: 

#  0.  In  the  simulations,  7  =  0.97.  Prom 
Fig.l  we  see  that  the  IV  based  approach  clearly  re¬ 
duces  the  bias  compared  with  PAST.  Based  on  this 
observation,  further  simulations  with  PAST  are  omit¬ 
ted.  The  next  example  illustrates  the  tracking  per¬ 
formance  of  the  algorithms.  The  performance  is  also 
compared  with  the  performance  obtained  with  the  fre¬ 
quency  estimates  obtained  from  the  n  dominant  left 
singular  vectors  of  Cz^{t).  We  consider  a  step-change 
in  a  frequency,  see  Fig.2  and  Fig.3.  In  Fig.3  we  have 
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Figure  2.  One  realization  of  the  frequency  esti¬ 
mates.  SNR=8  dB,  e{t)  =  W-  7  =  0*97 


Figure  3.  Deviation  from  orthonormality  for  a 
step-change.  SNR=8  dB,  e{t)  = 

7  =  0.97 


used  the  following  measure  of  deviation  from  orthonor- 
mality:  ||W^(t)W(t)  -  IuHf.  The  basic  IV-algorithm 
shows  a  tendency  to  ’over-shoot’,  but  this  behavior  is 
reduced  by  EIV-PAST.  This  is  perhaps  the  major  im¬ 
provement  offered  by  the  Extended  IV  method.  Note 
also  that  the  estimates  of  the  constant  frequency  are 
less  affected  for  EIV-PAST  than  those  of  IV-PAST. 
Note  also  that  it  is  almost  impossible  to  distinguish 
the  EIV-PAST  estimates  from  those  of  the  SVD,  ex¬ 
cept  during  the  transient  phase. 

6  Conclusions 

In  this  paper  Instrumental  Variable  generalizations 
of  the  subspace  tracking  algorithm  PAST  have  been 
proposed.  The  presented  algorithms  are  able  to  track 
slowly  time- varying  subspaces  in  colored  noise  fields. 
One  requirement  is  that  we  must  be  able  to  find  an 


rV-vector  that  is  uncorrelated  with  the  noise  vector. 
Additionally,  a  certain  rank  requirement  must  be  ful¬ 
filled.  The  conclusions  are  that  an  IV  approach  in  our 
examples  improves  the  results  when  the  noise  is  not 
spatially  white. 

A  Appendix 

In  this  appendix  we  give  the  recursive  updating  for¬ 
mulas  for  the  Extended  IV-PAST  algorithm,  P{t)  = 


-1 

1  .  See  [2]  for  a  derivation. 

W(i) 

=: 

W(t  -  1)  +  e(t)K(t) 

(14a) 

e{t) 

= 

v(i)  -  Wit  -  l)*(t) 

(14b) 

X{t) 

= 

(14c) 

Kit) 

= 

x-\t)^^  it)Pit  -  1) 

(14d) 

m 

= 

[w{t)  hit)] 

(14e) 

w{t) 

= 

Ckdt-mt) 

(14f) 

7j 

(14g) 

v(t) 

= 

[C,4(t-1)^(<)  z{i)] 

(14h) 

ChS) 

= 

(14i) 

c.dt) 

7C,«(t-l)  +  z(t)C^(t) 

(14j) 

hit) 

= 

W"(i-  l)z(f) 

(14k) 

Pit) 

= 

4  (P(i  -  1)  -  P(i  -  W)K(i))  (141) 
T 
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Abstract 

The  performance  of  DF-based  beamformers  w  seriously  de¬ 
graded  in  situations  where  the  array  is  imprecisely  calibrated,  or 
when  the  spatial  coherence  of  the  signal  wavefronts  is  perturbed. 
When  the  calibration  errors  or  perturbation  may  be  characterized 
by  a  set  of  parameters  drawn  from  a  known  Gaussian  distribu¬ 
tion,  a  maximum  a  posteriori  (MAP)  estimator  may  be  used  to 
separately  estimate  the  directions  of  arrival  and  the  perturbation 
parameters,  resulting  in  essentially  an  on-line  auto- calibration. 
This  paper  examines  the  improvement  that  results  from  using  the 
MAP  auto- calibrated  steering  vectors  in  standard  DF-based  beam- 
formers  to  estimate  the  received  signal  waveforms  and  suppress 
unwanted  interference.  For  the  special  case  of  additive  unstruc¬ 
tured  calibration  errors  and  uncorrelated  signals,  it  is  shown  that 
the  MAP  beamformer  is  similar  in  form  to  so-called  “subspace 
corrected”  approaches. 


1.  Introduction 

All  methods  for  direction-finding  (DF)  and  DF-based 
beamforming  rely  on  the  availability  of  information 
about  the  array  response,  and  assume  the  signal  wave- 
fronts  have  perfect  spatial  coherence.  Depending  on  the 
degree  to  which  the  actual  response  or  wavefronts  dif¬ 
fer  from  their  nominal  values,  DF  and  beamformer  per¬ 
formance  may  be  significantly  degraded.  To  account 
for  these  types  of  perturbations,  a  slightly  generalized 
model  for  the  array  response  will  be  considered  in  this 
paper.  The  response  will  be  parameterized  not  only  by 
the  directions  of  arrival  (DOAs)  of  the  signals,  but  also 
by  a  vector  of  perturbation  or  “nuisance”  parameters 
that  describe  deviations  of  the  response  from  its  nomi¬ 
nal  value.  These  parameters  can  include,  for  example, 
displacements  of  the  antenna  elements  from  their  nom¬ 
inal  positions,  uncalibrated  receiver  gain  and  phase  off¬ 
sets,  etc..  With  such  a  model,  a  natural  approach  is  to 
attempt  to  estimate  the  unknown  nuisance  parameters 
simultaneously  with  the  signal  parameters.  Such  meth¬ 
ods  are  referred  to  as  auto- calibration  techniques,  and 
have  been  proposed  by  a  number  of  authors,  including 
[1,  2,  3,  4]  among  many  others. 

When  auto-calibration  techniques  are  employed,  it  is 
critical  to  determine  whether  both  the  signal  and  nui¬ 
sance  parameters  are  identifiable.  In  certain  cases  they 
are  not;  for  example,  one  cannot  uniquely  estimate  both 
DOAs  and  sensor  phase  characteristics  (unless  of  course 
additional  information  is  available,  such  as  sources  in 
known  locations,  etc.).  The  identifiability  problem  can 


be  alleviated  if  the  perturbation  parameters  are  assumed 
to  be  drawn  from  some  known  a  priori  distribution. 
While  this  itself  represents  a  form  of  additional  informa¬ 
tion,  it  has  the  advantage  of  allowing  an  optimal  maxi¬ 
mum  a  posteriori  (MAP)  solution  to  the  problem  to  be 
formulated.  In  [4]  it  is  shown  that,  by  using  an  asymp¬ 
totically  equivalent  approximation  to  the  resulting  MAP 
criterion,  the  estimation  of  the  signal  and  nuisance  pa¬ 
rameters  can  be  decoupled,  leading  to  a  significant  sim¬ 
plification  of  the  problem. 

Presumably,  any  of  the  above  auto-calibration  meth¬ 
ods  would  provide  not  only  improved  DOA  estimates, 
but  also  calibration  information  that  would  be  useful 
in  beamformer  implementation.  In  this  paper,  beam- 
former  performance  is  investigated  for  the  case  where 
the  optimal  MAP  perturbation  parameter  estimates  of 
[4]  are  used  to  update  the  array  calibration.  Simula¬ 
tions  demonstrate  that  such  an  approach  can  result  in 
a  significant  performance  improvement,  measured  using 
either  interference  rejection  capability  or  mean-squared 
error.  In  addition,  for  simple  additive  unstructured  cal¬ 
ibration  errors,  the  MAP  approach  is  shown  in  certain 
cases  to  yield  a  beamformer  similar  to  the  subspace  cor¬ 
rected  algorithms  described  in  [5,  6]. 


2.  Mathematical  Model  and  Algorithms 

The  response  of  an  arbitrary  array  of  m  sensors  for  a 
given  DOA  9  will  be  denoted  by  the  m- vector  a(^,p), 
which  is  parameterized  by  a  vector  p  G  that  de¬ 
scribes  the  array  perturbation.  The  array  output  is  then 
modeled  by  the  following  familiar  equation: 


x(t)  =  [a(0i,p) 


r  1 


(i{0d,p)] 


L  Sd{t)  J 


+  n(t)  (1) 


=  A{e,p)s{t)  +  n{t)  ,  (2) 

where  s{t)  and  n(t)  represent  the  received  signals  and 
noise,  respectively.  It  will  be  assumed  that  for  a  given 
collect,  N  samples  are  taken  from  the  array.  Both  s{t) 
and  n{t)  are  assumed  to  be  temporally  white  zero-mean 
complex  Gaussian  random  processes,  with  covariances 
given  by  a^l  and  P,  respectively.  The  perturbation  term 
p  is  also  assumed  to  be  drawn  from  a  Gaussian  distri- 
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bution  with  known  mean  pg  (corresponding  to  the  nom¬ 
inal,  unperturbed  array  response)  and  covariance  ft. 

Given  the  above,  the  covariance  of  the  array  output 
and  its  eigendecomposition  may  be  written  as 

R  =  A{e,  p)TA* {0,  p)  -f  (7^  I  =  E„E;, 

where  contains  the  d  largest  eigenvalues,  and  the 
columns  of  the  tn  x  d  matrix  E5  are  the  corresponding 
unit-norm  eigenvectors.  Similarly,  the  columns  of  En 
are  the  m  —  d  eigenvectors  corresponding  to  a  . 

2.1.  An  Asymptotic  MAP  Estimator 

In  [4],  it  is  shown  that  estimates  of  0  and  p  asymptot¬ 
ically  equivalent  to  those  from  the  exact  MAP  estimator 
may  be  obtained  by  setting^ 

0  =  argmin  a^Mag  —  (3) 

0 

p  =  Po  -  r-if ,  (4) 

where 

ao  =  vec{A(0,Po))  ,  M  =  ®  {E„E*)  (5) 

U  =  (7-2 A^E,A2A7'E:A^*  ,  f  =  Re{D;Mao}  (6) 

f  =  Re|D;MD^  +  ^n-'|  (7) 

^  [aa{e,p)  aa(0,p)l 

L  -I  fi,po 

and  where  and  A  are  “consistent”  estimates  deter¬ 
mined  from  some  initial  estimation  step.  The  above 
approach  is  quite  general  in  that,  by  proper  choice  of 
p,  it  can  be  applied  to  arbitrary  types  of  model  errors. 
Another  key  advantage  is  that  estimation  of  B  and  p 
is  decoupled;  a  search  is  required  only  for  the  d  DO  A 
parameters  in  and  not  for  p  (which  is  calculated  di¬ 
rectly  given  0).  Other  properties  of  the  algorithm  are 
outlined  in  [4]. 


just  a  scaled  version  of  the  so-called  minimum  variance 
distortionless  response  (MVDR)  beamformer: 

w  =  .  (10) 

a*((9)R-ia(6') 

In  the  general  case  where  the  signal  and  interference 
are  correlated,  the  optimal  weights  depend  on  the  sig¬ 
nals  themselves  through  Rxa  or  P,  and  thus  they  cannot 
be  used  directly  (z.e.,  without  a  training  sequence,  for 
example).  In  the  approach  of  [8],  the  quantities  P  and 
R  in  (9)  are  replaced  by  their  structured  ML  estimates: 

P,  =  aS(R-^^I)AJ*  ,  R,  =  AoPsAJ  +  , 

where  Ao  =  A(0,Po)>  i'V  denotes  a  (left)  pseudo- 

inverse,  and  R  is  a  sample  estimate  of  R. 

Since  calibration  errors  were  not  addressed  in  [8],  the 
nominal  model  Pq  was  used  to  calculate  the  beamformer 
weights.  Nevertheless,  the  method  performs  well  when 
calibration  errors  are  present,  as  recently  demonstrated 
in  [9].  On  the  other  hand,  the  MVDR  approach  is  well 
known  to  be  hyper-sensitive  to  array  perturbations,  es¬ 
pecially  at  high  SNR.  While  cd  hoc  methods  employing 
artificial  noise  injection  have  been  used  to  combat  this 
problem,  other  techniques  based  on  subspace  corrected 
(SC)  weights  have  found  success  in  experimental  sys¬ 
tems  [5,  6].  In  these  approaches,  the  R“^  term  in  (10) 
is  replaced  by  EjAj^E^.  This  is  equivalent  to  project¬ 
ing  a(0)  onto  the  signal  subspace  prior  to  forming  the 
MVDR  weights. 

One  of  the  goals  of  this  paper  is  to  study  the  im¬ 
provement  that  results  from  using  the  method  of  [8]  with 
A{e,p)  rather  than  A{0,Po),  where  p  is  obtained  from 
the  MAP  estimator  in  (4).  This  approach  will  be  re¬ 
ferred  to  as  the  MAP  beamformer  in  the  sequel.  In  the 
next  section,  an  interesting  connection  is  made  between 
the  MAP  beamformer  and  the  SC-MVDR  method.  In 
particular,  it  is  shown  that  for  simple  unstructured  ar¬ 
ray  errors  and  uncorrelated  signals,  the  SC-MVDR  and 
MAP  weights  have  a  very  similar  form. 

3.  Some  Special  Cases 


2.2.  Optimal  Beamformers 

The  minimum  mean  squared  error  (MSE)  beamformer 
weights  are  easily  shown  to  be 

W„sE  =  R"^Rx5  =  R"^ A(0, p)P  .  (9) 

When  the  desired  signal  is  uncorrelated  with  the  inter¬ 
ference,  P  is  diagonal  and  the  minimum  MSE  solution  is 

^Strictly  speaking,  the  equivalence  of  the  above  estimator  and 
the  optimal  MAP  approach  only  holds  for  first  order  errors  p  —  Pq 
that  are  “of  the  same  order”  as  the  finite  sample  effects  of  the 
noise.  In  other  cases  (particularly  those  model  errors  are  domi¬ 
nant),  a  different  approach  should  be  used.  For  more  details,  see 

[4,  7]. 


For  the  moment,  consider  the  following  simple  un¬ 
structured  model  for  the  perturbed  array  response: 


A(0,p)  =  A(0)  +  A 

__  Re{vec(A)} 
^  Im{vec(A)} 


(11) 

(12) 


where  the  columns  of  A,  denoted  modeled  as  zero 

mean  Gaussian  random  vectors  with  moments 


E[dtd]^]  —  Vik  I  i  ]  0 


iyk  = 


1,- 


,d. 

(13) 
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This  model  corresponds  to  an  additive,  circularly  sym¬ 
metric  complex  array  perturbation  that  is  uncorrelated 
from  sensor  to  sensor,  but  possibly  ^-dependent.  It  is 
easy  to  verify  that  under  these  assumptions,  the  covari¬ 
ance  of  p  is  given  by 


_l\Re{r}(S>l  -Im{T}0l' 

“  2  [  Im{T}0l  Re{T}0l  ’ 


(14) 


where  the  z,  element  of  the  matrix  T  is 
It  is  interesting  to  examine  the  form  of  the  MAP  es¬ 
timate  p  for  this  case.  To  begin  with,  note  that  for 
the  above  model  Po  =  0  and  Dp  =  [I  jl],  where  I  is 
md  X  md.  Thus,  p  =  and 


Furthermore,  if  the  estimated  MAP  array  response  is 
used  in  (10),  the  MVDR  beamformer  (10)  will  converge 
to  the  SC-MVDR  approach.  The  condition  Y“^/iV’  — ^  0 
occurs  either  with  a  large  data  sample,  or  when  the  ar¬ 
ray  perturbation  is  large.  In  either  case,  the  information 
provided  by  the  prior  distribution  of  p  is  of  little  value, 
and  is  essentially  ignored  by  the  MAP  criterion.  This 
observation  provides  some  theoretical  justification  for 
the  SC- MVDR  technique,  which  previously  had  been 
derived  using  ad  hoc  (but  well  motivated)  reasoning. 
However,  in  cases  where  the  prior  cannot  be  neglected, 
using  SC  response  vectors  for  beamforming  will  not  be 
optimal  and  significant  degradation  can  result.  This  is 
seen  in  the  simulation  examples  described  later. 


Re{M  +  ^T-ioI} 


Re{M  +  ^T-i®I} 


Re{Mao} 

Im{Mao} 


Using  the  fact  that,  for  any  invertible  matrix  Z, 


3,1.  Gain  and  Phase  Errors 

For  arrays  composed  of  nominally  identical  elements, 
a  common  approach  used  to  describe  deviations  in  the 
array  response  attempts  to  model  the  non-uniform  gain 
and  phase  effects  of  the  receiver  electronics  behind  each 
antenna  element.  In  this  model,  the  nominal  response 
is  perturbed  by  an  unknown  complex  diagonal  matrix: 


■Re{Z}  -Im{Z}' 

_1 

■Re{Z-^} 

-Im{Z-i} 

Im{Z}  Re{Z} 

Im{Z-i} 

Re{Z-i} 

1 

15) 

A{e,p)  =  GA{e)  ,  p  = 

Re{g} 

.  Im{g}  . 

Re|(M+ iT-i  Ol)  ^Mao 


Im  ^  (m  +  iT-i 


O' 


Man 


(16) 


A  further  simplification  of  (16)  is  possible  that  is  quite 
revealing.  Using  the  definition  of  M  in  (5),  note  that 

=  (u^  +  :^t-i)“'®(e„e;)  +  7vt®(e,e:). 


(19) 


where  g  =  diag{G}.  The  mean  of  the  distribution  for 
p  in  this  case  is  given  by  p^  =  [e^  0]^,  where  e  is  an 
m  X  1  vector  of  ones.  For  simplicity,  in  this  discussion 
the  covariance  of  p  will  be  assumed  to  be  D  =  {al/2)l, 
which  implies  that  the  individual  gain  and  phase  errors 
are  all  mutually  independent  and  identically  distributed. 

The  derivation  of  the  MAP  estimate  of  p  and  hence  g 
is  straightforward  but  somewhat  cumbersome,  and  thus 
will  not  be  presented  here.  However,  the  result  is  quite 
simple,  and  is  given  by 


g=  (l  +  <T2fVZ)“'e 


(20) 


Multiplying  this  equation  on  the  right  by  Mao  sind  sim¬ 
plifying  then  yields 


O  rr  — 

’  Re{ 

■(l+i(TU^)-i)-^®(E„E;)' 

ao  I 

H  — 

Im{ 

>+i(TU^)-i)-'®(E„E;)' 

ao} 

Finally,  using  (12)  and  properties  of  the  Kronecker  prod¬ 
uct,  the  MAP  estimate  of  the  array  response  becomes 


Z  = 


d 

Ukia{9i)a^0k) 


i,k=l 


o(e„e;: 


(21) 


where  Uki  is  the  k,i^^  element  of  U,  (•)  denotes  conjuga¬ 
tion,  and  ©  an  element-wise  (Hadamard)  product.  Note 
that  for  very  small  gain/phase  errors  where  era  0, 
g  — *  e  and  hence  G  — ►  I  as  expected. 


A(0,p)=A(0)-E„E;A(0)(l-fl(TU^)-i)  '  . 

(18) 

The  key  point  of  interest  is  that,  if  /N  — >  0,  then 
the  MAP  estimate  of  the  array  response  converges  to  a 
subspace  corrected  version  of  the  nominal  response: 

lim  A(0,p)  =  E,E!A(^). 


4.  Simulation  Results 

In  this  section,  the  performance  of  the  MAP  beam- 
former  is  studied  by  means  of  a  number  of  simulation 
examples.  The  first  example  involves  a  nominally  unit- 
gain  uniform  linear  array  perturbed  by  an  unstructured 
calibration  error  in  the  form  of  equation  (11)-(14)  with 
T  =  cr^I  and  a  a  =0.2.  The  array  receives  100  samples 
of  two  20dB  SNR  uncorrelated  Gaussian  signals  with 
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Figure  1:  A  Comparison  of  Beamformer  Performance,  Unstruc¬ 
tured  Calibration  Errors 


Figure  2:  Root  MSE  Performance  of  Various  Beamformers  for  a 
Multipath  Channel 


arrival  angles  of  5°  and  15°.  Using  DOA  estimates  from 
the  optimal  MAP  estimator,  the  relative  interference  re¬ 
jection  capability  of  the  MVDR,  SC-MVDR,  and  MAP 
beamformers  was  calculated  for  various  array  sizes.  The 
results  are  plotted  in  Figure  1  based  on  500  indepen¬ 
dent  trials.  The  plot  shows  the  gain  of  the  beamformer 
weights  for  the  5°  source  in  the  direction  of  the  15°  in- 
terferer  (normalized  for  a  unit  gain  response  at  5°).  The 
subspace  correction  eliminates  the  signal  cancelation  ef¬ 
fect  of  the  MVDR  approach,  but  the  MAP  beamformer 
provides  a  significant  advantage,  especially  for  larger  ar¬ 
rays.  The  above  simulation  was  repeated  assuming  re¬ 
ceiver  gain/phaise  errors  as  described  by  (19),  also  with 
aa  =  0.2,  and  a  plot  almost  identical  to  Figure  1  was 
obtained.  Algorithm  performance  is  seen  in  this  case 
to  depend  very  little  on  the  type  of  calibration  error 
encountered. 

When  the  signals  arriving  at  the  array  are  highly  cor¬ 
related,  interference  rejection  is  no  longer  an  appropri¬ 
ate  performance  criterion.  In  such  cases,  an  optimal 
beamformer  will  attempt  to  combine  correlated  arrivals 
with  the  desired  signal  to  improve  the  quality  of  the  re¬ 
sulting  estimate,  as  measured  using  (for  example)  mean- 
squared  error.  To  examine  beamformer  performance  for 
the  case  of  correlated  signals,  a  two-ray  multipath  chan¬ 
nel  was  simulated  for  various  relative  delays.  A  miscal- 
ibrated  5-element  linear  array  was  assumed  to  receive 
a  random  QPSK  signal  from  —6°,  as  well  as  a  slightly 
delayed  copy  of  the  signal  from  6° .  Both  arrivals  had  an 
SNR  of  0  dB,  and  the  array  was  again  perturbed  accord¬ 
ing  to  (11)<(14)  with  T  =  all  and  aa  =  0.15.  For  each 
trial,  MAP  DOA  estimates  were  obtained  based  on  75 
samples  from  the  array,  and  normalized  RMS  signal  er¬ 
rors  were  computed.  The  results  are  plotted  in  Figure  2 
for  various  relative  delays  between  the  two  arrivals.  The 
“uncompensated”  approach  corresponds  to  the  method 
of  [8]  implemented  with  A(0,  Po)  rather  than  A(0,  p)  as 
in  the  MAP  beamformer.  The  minimum  MSE  curve  was 
obtained  using  a  known  75-sample  training  sequence  to 
compute  the  optimal  weights,  and  was  included  to  give 


an  idea  of  the  “best  possible”  performance. 

While  the  SC-MVDR  approach  can  to  some  degree 
compensate  for  array  perturbations,  it  cannot  eliminate 
signal  cancelation  due  to  the  presence  of  a  correlated 
arrival,  and  its  performance  in  this  case  is  quite  poor. 
For  small  delays,  correcting  for  calibration  errors  yields 
a  25-30%  improvement  in  RMS  error,  which  translates 
into  a  reduction  in  symbol  error  rate  of  approximately 
a  factor  of  6  (from  .041  to  .007)  for  this  example. 
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Abstract 

An  iterative  algorithm  (IVESPA)  fornarrow-band direc¬ 
tionfinding  and  waveform  recovery  is  presented  which  is 
based  on  the  virtual-ESPRIT  (VESPA)  of  [1],  IVESPA  can 
handle  the  case  where  the  data  length  is  short  and  some 
of  the  sources  have  very  small  higher-order  statistics  com¬ 
pared  to  others,  in  which  case  VESPA  needs  more  data  to 
localize  the  weak  sources.  IVESPA  can  be  applied  to  uncal¬ 
ibrated  and  arbitrary-shape  arrays  provided  the  array  has 
two  sensors  having  identical  response-the  same  require¬ 
ment  as  in  VESPA.  Results  of  a  real  data  experiment  demon¬ 
strating  IVESPA  are  presented. 


1.  Introduction 

Estimating  the  parameters  of  narrow-band  signals  using 
an  array  of  sensors  has  been  a  very  attractive  problem  of 
research.  Typically,  the  parameters  of  interest  are  the  di¬ 
rections  of  arrival,  polarizations  and  the  waveforms  of  the 
signals.  Existing  approaches  to  this  problem  can  be  clas¬ 
sified  into  two  main  categories  as  the  so-called  subspace- 
and  nonsubspace-based  ones.  The  subspace-based  methods 
are  usually  preferred,  because  they  yield  high  resolution  re¬ 
sults.  These  methods  require  eigendecomposition  or  singu¬ 
lar  value  decomposition  of  an  array  covariance  or  cumu- 
lant  matrix,  depending  on  the  particular  subspace-method 
used.  From  configuration  point  of  view,  subspace  methods 
based  on  the  array  covariance  matrix  are  applicable  to  ar¬ 
rays  which  have  either  analytically-known  response  or  iden¬ 
tical  but  displaced  subarrays,  or  calibrated  arrays.  Among 
the  subspace  methods,  VESPA  of  Dogan  and  Mendel  [1], 
which  is  based  on  a  cumulant  matrix,  has  the  lightest  config¬ 
uration  requirements:  two  sensors  having  identical  response 
are  needed;  other  sensors  in  the  array  may  have  arbitrary  and 
unknown  responses  and  configurations. 

Like  all  subspace-based  methods,  VESPA  relies  on  sam¬ 
ple  statistics  of  the  array  measurements  which  suffer  from 
cross  terms  due  to  the  presence  of  multiple  sources.  When 


some  of  the  sources  have  very  small  powers  and  cumulants 
compared  to  those  of  other  sources,  undesirable  cross  terms 
are  present  in  the  sample  statistics  of  the  weak  sources  due  to 
the  other  sources  for  small  numbers  of  samples.  In  this  case, 
VESPA  fails  to  accurately  localize  the  weak  sources.  In 
practice,  this  case  occurs  when  the  source  signals  have  dif¬ 
ferent  constellations  and  significantly  different  power  lev¬ 
els.  Note  that  the  denser  the  source  constellation  becomes, 
the  smaller  the  cumulant  of  the  signal  becomes,  because  the 
signal  looks  more  Gaussian.  For  example,  fourth-order  cu¬ 
mulants  of  unit-power  BPSK,  4QAM  and  16QAM  signals 
are  -2,  -1  and  -0.68,  respectively.  In  addition,  sources  hav¬ 
ing  small  powers  are  deemphasized  during  the  calculation  of 
sample  higher-order  statistics,  because  higher  than  second- 
order  powers  of  the  data  are  computed.  As  an  example  of 
this  case,  we  will  present  the  results  of  a  real  data  experi¬ 
ment,  in  Section  3,  that  involves  three  sources:  a  BPSK  sig¬ 
nal,  another  BPSK  signal  with  power  —11.23  dB  below  the 
first  one  and  a  16QAM  signal  with  power  -22. 10  dB  below 
the  first  BPSK  signal. 

The  problem  is  formulated  and  the  solution  is  presented 
in  Section  2.  A  real  data  experiment  is  presented  in  Sec¬ 
tion  3.  Conclusions  are  provided  in  Section  4.  Through¬ 
out  this  paper,  lower-case  boldface  letters  represent  vectors; 
upper-case  boldface  letters  represent  matrices;  and,  lower 
and  upper-case  letters  represent  scalars.  A(f,  j)  denotes  the 
ij -ih  element  of  A. 


2.  Problem  Formulation  and  the  Proposed  Al¬ 
gorithm 

Suppose  that  we  have  an  M  element  array  containing  an 
identical  response  pair  of  sensors.  The  other  elements  of  the 
array  may  have  arbitrary  and  unknown  configuration  and  re¬ 
sponses.  Consider  a  signal  scenario  where  there  are  P  in¬ 
dependent  narrowband  signals  having  nonzero  fourth-order 
cumulants  which  are  received  by  the  array  from  directions 
{(^1, .  •  Let  r(<)  =  [ri(f),  •  •  •,rAf(/)]^  be  the  re- 
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ceived  signal  vector  which  can  be  expressed  as 

r(^)  =  As(t)  -f  n{t)  (1) 

where  A  =  [ai,  •  - ,  is  the  M  x  P  steering  ma¬ 
trix,  s(t)  =  [si{t)r-y  sp{t)]  is  the  P-vector  of  inde¬ 
pendent  sources  and  n{i)  is  a  Gaussian  noise  process  inde¬ 
pendent  of  the  signals.  The  problem  of  interest  is  to  esti¬ 
mate  the  directions  {<^i , '  ‘ recover  the  sources 

{si{t)r-ySp{t)}. 

Before  presenting  the  solution,  we  adopt  the  following 
notation  for  fourth-order  cumulant  matrices.  Given  two 
scalar  processes  xi{t)  and  X2{t)  and  an  M-vector  process 
y(t),  we  define  cum(xi(i),  X2{t),  y(^)>  y(O^)  the  M  x 
M  matrix  whose  ij-ih  entry  is  cum(xi(<),  X2{t)j 
yj{t))  where  yi{t)  and  yj{t)  are  the  z-th  and  j~th  compo¬ 
nents  of  y(t),  respectively. 

We  propose  the  following  iterative  algorithm  for  esti¬ 
mating  the  directions  of  the  signals: 

Step  1:  Estimate  the  following  two  fourth-order  cumulant 
matrices: 

Cn  =  cum(ri(t),r5;(t),r(t),r(t)^) 

p 

=  l]74,p|A(l,p)papap^ 

=  AAA"  (2) 

and, 

Ci2  =  cum(r2(t),r*(t),r(t),r(f)") 

=  ^  74,p  I A  ( 1 ,  p)  P  e ^  ap  ap  " 

p=i 

=  A$AA"  (3) 

where  {74,p}"=i  are  the  fourth-order  cumulants  of  the 
sources;  A  =  dia5{|A(l,  1)^74, i,  •••,  |A(1,P)|  74, p} 
and  $  =  (2)  is  de¬ 

rived  using  cumulant  properties  [CPI],  [CP3],[CP5],  [CP6] 

in  [2].  Note  that  the  fourth-order  cumulant  of  the  additive 
Gaussian  measurement  noise  is  zero. 

Having  estimated  the  matrices  Cn  and  C12,  and 
assuming  only  one  source  is  present,  the  arrival  angle 
4>i  of  the  most  powerful  source  is  obtained  by  follow¬ 
ing  the  ESPRIT  solution  described  in  the  Appendix,  as 
<Pi  =  -sin“^  (2^Z(/r//y)).  Note  that  this  step  is  the 
same  as  VESPA  except  that  we  assume  there  is  only  one 
source.  The  procedure  in  the  Appendix  also  gives  the 
steering  vector  aj  of  the  most  powerful  source  (i  =  1, 
ai  =  bi) .  Then  proceed  by  repeating  the  following  steps 
for  i  =  2,  •  •  • ,  P: 


Step  2:  Form  a  modified  signal,  r,(t)  =  Nir(t)  where 
Nj  is  the  left  null-space  of  the  M  x  {i  -  1)  matrix 
Ai  =  [ai_i,  •  •  • ,  ai],  (A2  =  ai).  Doing  so  suppresses  the 
most  powerful  (i  —  1)  sources  in  r(t). 

Step3:  Estimate  the  following  two  (M-i-t-1)  x  (M— i-f  1) 
cumulant  matrices: 

Cii  =  cum(r,i(t),r*i(t),ri(t),r,(t)")  (4) 

Ci2  =  cum(ri2(t),r*i(t),r,(t),r,(t)")  (5) 

where  Vik{t)  is  the  fcth  element  of  {M  —  i  -I-  l)-vector  r,(t). 

Assuming  only  one  source  is  present,  find  the  modified 
steering  vector  b,  of  that  source  following  the  procedure  in 
the  Appendix. 

Step  5:  Compute  a,-  =  pmt;(Ni)bi,  where  pint;  denotes 
pseudoinverse. 

Step  6:  Use  the  elements  of  aj  corresponding  to  the  identi¬ 
cal  response  pair  of  sensors  to  find  the  direction  of  the  ith 
source.  This  is  done  as  follows: 

Let  the  identical  response  pair  be  the  m-th  and  (m  -t-  l)th 
sensors.  Then  the  responses  of  these  sensors  to  the  i-th 
wavefront,  i.e.  the  m-th  and  m  -I-  1-th  elements  of  a,-,  are  in 
the  form  aim  =  c,-  and  a,(m+i)  =  c,e“^  “5-  where  d 
is  the  separation  between  the  m-th  and  (m  -f  l)th  sensors. 
Consequently,  <f>i  can  be  found  from  a,m  and  ai(m+i)- 

Step  7:  Recover  the  ith  source  using  a,  in  an  MVDR  beam- 
former. 

3.  Experimental  Results 

In  this  section  we  demonstrate  IVESPA  and  compare  it 
with  VESPA  by  means  of  the  following  experiment,  using  a 
set  of  data  provided  by  our  sponsor,  CRASP. 

Three  signals  of  1000  symbols  each  are  generated.  The 
signal  types  are  BPSK,  BPSK  and  16QAM,  and  they  oc¬ 
cupy  a  bandwidth  of  350  KHz.  These  signals  were  used 
to  modulate  wavefronts  designed  to  approximate  uniform 
plane  waves  impinging  upon  an  8-element  uniform  linear  ar¬ 
ray  with  an  element  spacing  of  one  half  wavelength  at  900 
MHz.  The  arrival  directions  are:  BPSKl  at  6.3°,  BPSK2  at 
25.2°  and  16QAM  at  40°.  The  900  MHz  8-channel  mea¬ 
surements  were  do wncon verted  and  sampled  at  5.12  MHz. 

The  eigenvalues  of  the  estimated  8x8  array  covariance 
matrix  are  as  follows: 

10^  *[6.25, 0.47, 0.03, 0.00, 0.00, 0.00, 0.00, 0.00]  (6) 

First,  VESPA  was  applied  to  this  data.  VESPA  starts  by 
choosing  a  guiding  sensor  pair  and  estimating  two  cumulant 
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matrices.  In  our  case,  any  two  of  the  sensor  measurements 
can  be  used  as  the  guiding  sensor  pair  since  the  array  is  uni¬ 
form  and  linear.  We  used  the  first  two  sensors  for  this  pur¬ 
pose,  and  estimated  the  following  fourth-order  cumulants: 

Cl  =  cum(ri(<),rt(<),r(:«),r(<)^) 

C2  =  cum(r2(<),rj(<),r(i),r(i)^)  (7) 

Before  applying  the  rest  of  the  VESPA  steps  we  first 
checked  the  singular  values  of  Ci  and  C2 ;  e.g.,  the  singular 
values  of  Cl  are  found  to  be: 

10*  *  [3.73, 0.06, 0.004,  0.00, 0.00, 0.00, 0.00, 0.00]  (8) 

Observe  that  the  the  second  and  third  signal  singular  values 
which  belong  to  the  second  BPSK  source  and  the  16QAM 
source,  respectively,  are  very  small  compared  to  the  first  sin¬ 
gular  value,  which  belongs  to  the  first  BPSK  signal.  One 
reason  why  the  singular  values  of  the  cumulant  matrix  are 
more  separated  than  the  eigenvalues  of  the  covariance  ma¬ 
trix  is  that,  the  computation  of  fourth-order  cumulant  esti¬ 
mates  requires  fourth  powers  of  the  data,  and  these  increase 
faster  than  the  second  powers  for  high  signal  levels.  Yet  an¬ 
other  reason  is  the  difference  between  the  fourth-order  cu¬ 
mulants  of  equal-power  BPSK  and  16QAM  signals,  as  men¬ 
tioned  in  Section  1.  Applying  VESPA,  we  obtained  the  fol¬ 
lowing  angle  estimates: 


4.  Conclusions 

We  presented  an  iterative  high-resolution  cumulant- 
based  algorithm  (IVESPA)  for  direction  finding  and  wave¬ 
form  recovery.  Our  algorithm  is  based  on  VESPA  of  [1]; 
however,  IVESPA  can  handle  some  signal  scenarios  for 
which  VESPA  fails  to  localize  all  the  sources  accurately. 
IVESPA  is  more  general  than  VESPA  in  terms  of  applicabil¬ 
ity,  but  computationally  more  intensive.  We  demonstrated 
IVESPA  by  means  of  a  real-data  experiment. 

5.  Appendix:  A  procedure  for  estimating  the 

arrival  angle  and  steering  vector  of  the 
most  dominant  source 

A  modified  form  of  TLS  ESPRIT  [3]  for  one  source: 
Step  1:  Stack  Cn  and  C,-2  intoa2(M  — x  (M-f-hl) 
matrix  C  as  follows: 


and,  perform  the  SVD  of  C;  keep  the  first  left  singular 
2(M  —  i  -b  1) -vector  of  C.  Let  this  vector  be  ui. 

Step  2:  Partition  ui  into  two  (M  -  i  +  l)-vectors  uu  and 

Ui2. 


6.37^6.30^7.43^  (9) 

which  shows  that  VESPA  is  biased  towards  the  most  pow¬ 
erful  source. 

Second,  we  applied  IVESPA  to  this  data,  and  obtained  the 
following  angle  estimates: 

6.34^25.86^40.5r  (10) 

It  is  seen  that  the  arrival  angles  are  estimated  correctly  with 
IVESPA. 

Finally,  we  show  that  as  the  sample  size  is  increased, 
VESPA  gives  accurate  estimates.  To  show  this,  we  simu¬ 
lated  the  same  real  data  experiment  in  the  computer  paying 
particular  attention  to  the  signal  conditions.  We  increased 
the  sample  size  by  500  steps  in  this  range,  and  for  each 
sample  size,  we  ran  both  VESPA  and  IVESPA  on  the  sim¬ 
ulated  data  for  10  realizations  of  the  experiment.  The  av¬ 
eraged  direction-of-arrival  estimates  obtained  from  VESPA 
and  the  actual  values  of  DO  As  are  plotted  as  a  function  of 
data  length  in  Figure  1.  It  is  observed  that  for  short  data 
lengths  VESPA  fails  to  give  reliable  estimates;  however,  as 
the  data  length  increases,  the  estimates  converge  to  their  ac¬ 
tual  values.  On  the  other  hand,  IVESPA  worked  fine  for  all 
the  values  of  the  data  length. 


Step  3:  Perform  the  SVD  of  [un,  U12];  keep  the  last  right 
singular  vector  of  [un,  U12].  Let  this  2-vector  be  f . 

Step  4:  Partition  f  as  f  = 

Step  5;  An  estimate  of  the  modified  steering  vector  of  the 
source  is  obtained  to  within  a  scalar,  as  b,-  =  un  —  7^ui2. 

Jx 
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Figure  1:  Average  value  of  the  direction-of-arrival  es¬ 
timates  obtained  from  10  realizations  of  VESPA  as 
the  data  length  is  varied.  The  actual  values  of  the 
direct ions-of- arrival  are  marked  as  on  the  plot. 
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Abstract 

Noise  reduction  of  transportation  is  of  major  concern 
for  environmental  topics.  As  regards  the  railway,  high 
speed  creates  new  noise  sources.  This  paper  describes 
the  last  step  of  acoustic  moving  sources  study.  Localiza¬ 
tion  methods  using  microphones  arrays  provide  positions, 
acoustic  powers,  and  spectrurns  of  sources.  The  proposed 
one  computes  the  directivity  pattern  of  sources  and  gives 
a  time-frequency  representation  of  the  ernited  signal  while 
the  sources  pass  the  measurement  point.  Experiments  are 
carried  out  to  characterize  acoustic  sources  of  a  high  speed 
train  (TGV)  in  real  operating  conditions. 


1.  Introduction 

Localization  techniques,  using  an  array  of  microphones, 
provide  the  acoustic  power,  the  position  and  the  spectrum 
of  the  source.  The  beamforming  can  not  be  used  [4]  to  lo¬ 
calize  high  speed  moving  noise  sources  without  modifica¬ 
tions.  The  dedopplerisation  [1]  is  a  method  currently  used. 
Another  technique  [5]  needs  a  time-frequency  analysis  of 
the  output  of  the  array.  In  this  case,  the  array  is  focused 
at  the  end-fire.  The  time  localization  property  of  bilinear 
time-frequency  representation  and  the  spatial  selectivity  of 
the  array  perform  the  localization.  The  directivity  pattern 
of  moving  acoustic  sources  is  much  difficult  to  estimate.  Its 
computation  must  be  performed  while  the  source  passes  the 
measurement  area. 

In  this  paper,  a  method  to  measure  the  directivity  pat¬ 
tern  of  a  moving  noise  source  is  presented.  Our  interest 


only  concerns  linear  trajectories  and  constant  speed  move¬ 
ment.  Sources  are  supposed  localized  thanks  to  a  method 
described  above.  First,  the  principle  of  the  technique  is 
described.  Then,  the  array  processing  technique  is  pre¬ 
sented.  The  choice  of  the  bilinear  time-frequency  distri¬ 
bution  is  achieved.  Some  simulations  are  carried  out  to  test 
the  method  with  several  directivity  patterns.  The  directivity 
pattern  of  a  moving  source  of  a  high  speed  train  (TGV)  is 
computed. 

2.  Proposed  method 

2.1  Principle 

In  order  to  estimate  the  directivity  pattern  of  a  noise 
source,  its  level  is  measured  for  several  observation  angles 
around.  When  the  source  is  moving,  it  is  difficult  to  turn 
around.  On  the  other  hand,  the  passing  source  can  be  ob¬ 
served  through  several  angles.  The  array  processing  per¬ 
forms  the  tracking.  Then,  the  evolution  of  its  level  may  be 
computed  by  a  time-frequency  representation.  The  result  is 
presented  according  to  the  observation  angle  in  the  source 
space  in  a  polar  plot.  The  figure  1  presents  the  three  steps 
of  the  technique  described  below. 

2.2  Source  tracking 

Firstly,  an  array  processing  focuses  the  moving  source 
during  its  passby.  The  employed  method  is  very  similar  to 
the  dedopplerisation  method  used  in  the  context  of  localiza¬ 
tion  by  [1].  Here,  theposition  of  the  source  is  considered  to 
be  choosen.  The  Doppler  effect  is  removed  by  re-building 


0-8186-7576-4/96  $5.00  ©  1996  IEEE 


90 


the  emitted  signal  which  would  be  received  on  sensor  in  the 
case  of  a  non-moving  source.  The  array  output  of  N  micro¬ 
phones  is  computed  with  the  following  equation  : 


S{t) 


Em  ^ 

1  =  1  Ri 


(1) 


where  Pi{t)  is  the  pressure  on  the  microphone  i  at  time  t, 
Ri  is  the  distance  between  the  microphone  i  and  the  focused 
source  at  time  t,  on  is  a  coefficient  of  a  weighting  window 
and  c  is  the  sound  velocity.  The  computation  of  the  acoustic 
pressure  at  time  t  -f  ^  needs  an  interpolation  between  two 
samples.  The  output  signal  S{t)  corresponds  to  the  emited 
signal  of  the  focused  source,  windowed  by  a  spatial  filter 
moving  around  it.  The  continuous  estimation  of  the  signal 
level  provides  the  directivity  pattern. 


frequency.  An  algorithm  permits  to  follow  the  maximum 
level  along  the  modulation  curve  around  the  emitted  fre¬ 
quency.  A  reference  of  the  source  position  is  taken  at  thebe- 
gining  of  the  tracking.  The  time  axis  of  the  time-frequency 
diagram  is  converted  into  observation  angles  in  the  source 
space.  The  directivity  pattern  corresponding  to  the  position 
and  the  frequency  selected  can  be  drawn  in  the  source  space. 


3.  Simulations 

In  order  to  test  the  described  method,  some  simulations 
are  carried  out.  A  sine-wave  source  at  frequency  /,  consid¬ 
ered  to  be  localized,  is  moving  along  a  linear  trajectory  at 
constant  speed  v.  Its  directivity  pattern  is  a  cos  6  shape,  like 
a  dipole  source.  The  received  pressure  on  the  microphone  i 
is  [3] : 


2.3  Bilinear  transformation  of  the  reconstructed 
signal 


During  the  previous  tracking,  the  Doppler  effect  is  not 
perfectly  removed  because  of  the  interpolations,  so  a  fre¬ 
quency  modulation  remains.  An  efficient  analysis  tool  must 
be  able  to  track  the  signal  level  of  the  output  of  the  array 
processing.  Flandrin  [2]  has  shown  that  some  transforma¬ 
tions  belonging  to  the  Cohen's  class  are  well  suited  to  track 
frequency  shifts : 

COs(t,f}}  =  J  J  J  S{t' +  ^)S*{t' -^) 

dr.  (2) 

where  /(r/,  t)  allows  to  build  the  transformation  suited  to  a 
given  frequency  evolution. 

In  a  first  approximation,  the  frequency  evolves  slowly  ac¬ 
cording  to  a  linear  law.  Among  Cohen’s  transformations 
(equation  2),  the  Wigner-Ville  (WV)  transformation  is  opti¬ 
mal  to  follow  linear  frequency  modulation  : 

WVsH,  f)=^  J  s{t  +  ^)S*it  -  (3) 

Another  property  of  this  transformation,  in  contrast  with  the 
Fourier  transform,  is  the  conservation  of  the  time  support  of 
the  signal.  This  property  allows  to  localize  the  apparition 
time  of  the  signal  with  more  accuracy. 

In  practice,  the  Pseudo-Smoothed  Wigner-Ville  (PSWV) 
transformation  is  used  to  reduce  the  interference  terms  due 
to  the  bilinear  structure  of  the  WV  distribution. 

2.4  Directivity  pattern  representation 

The  previous  step  of  the  proposed  method  provides  a 
time-frequency  diagram  of  the  output  of  the  array  process¬ 
ing.  It  corresponds  to  the  levels  of  the  focused  point  at  each 


where  , 

"  y/D^  -b  (Ui)2  ’ 

and  where  Ma  =  ^  is  the  Mach  number,  D  is  the  distance 
between  the  trajectory  and  the  receiver  and  D{9i)  describes 
the  source  directivity. 

The  output  of  a  linear  array  of  29  microphones  spaced 
out  6cm  and  located  at  6.5m  away  from  the  trajectory  is 
computed.  Figure  3  shows  the  directivity  pattern  D{6)  of 
the  simulated  source  in  dotted  line.  The  tracking  of  the 
supposed  source  position  is  achieved  with  the  equation  1. 
The  PSWV  transformation  of  the  re-builded  signal  is  com¬ 
puted  and  presented  in  figure  2.  In  this  simulated  case,  the 
Doppler  effect  is  perfectly  removed  and  the  constant  source 
frequency  appears.  The  time  axis  is  converted  into  a  9  axis. 
At  time  Os,  the  source  is  at  the  end-fire  of  the  array.  The 
measured  directivity  pattern  drawn  in  the  source  space  is 
presented  in  solid  line  in  figure  3.  This  result  can  be  com¬ 
pared  with  the  directivity  pattern  of  the  simulated  source 
presented  on  the  same  picture. 

The  proposed  method  has  been  tested  for  several  directiv¬ 
ity  patterns  with  different  shapes  and  directions  and  also  for 
different  source  speeds.  In  all  cases,  it  measures  a  directiv¬ 
ity  pattern  corresponding  to  the  simulated  one. 

4.  Measurements 

This  method  is  applied  on  acoustic  sources  in  a  real  sit¬ 
uation.  The  previous  linear  array  configuration  is  used.  Its 
frequency  range  is  [2000Hz,  4000i/ z].  An  experiment  with 
a  moving  acoustic  source  for  which  the  position  and  the  fre¬ 
quency  are  known  is  carried  out.  A  loudspeaker  is  fixed  on 
a  high  speed  train  (TGV)  and  generates  two  sine-waves  at 
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mOHz  and  3400Hz. 

The  figure  4  shows  the  result  of  the  localization  [5]  of  the 
acoustic  source  on  the  train.  The  position  along  the  train 
is  -11.5m.  This  area  is  selected  to  be  analyzed  with  the 
proposed  method.  The  PSWV  of  the  tracking  signal  is  pre¬ 
sented  on  figure  5.  The  Doppler  effect  is  not  perfectly  re¬ 
moved.  The  energy  of  the  signal  is  concentrated  round  two 
frequencies  corresponding  to  the  emitted  ones.  The  follow¬ 
ing  of  maximum  levels  along  modulation  curves  permits  to 
extract  two  directivity  patterns  at  frequency  3000Hz  and 
3400Hz  shown  in  figures  6  and  7  in  the  source  space.  The 
frequency  evolution  of  the  source  during  the  tracking  shows 
that  the  algorithm  follows  a  single  source. 

For  both  frequencies,  the  shapes  of  directivity  patterns  are 
similar.  The  movement  introduces  a  small  rotation  of  the 
diagrams.  This  technique  has  been  successfuly  applied  on 
noise  sources  of  the  train. 

5,  Conclusion 

The  proposed  method  is  the  final  step  of  the  study  of 
moving  acoustic  sources.  A  localization  technique  provides 
for  each  source,  the  acoustic  power,  the  position  along  the 
train,  the  height  and  the  spectrum.  The  time- frequency  rep¬ 
resentation  of  the  source  tracking  permits  to  characterize  the 
emited  signal.  If  it  is  localized  in  position  and  frequency, 
the  directivity  pattern  of  the  source  can  be  computed. 

The  main  advantage  of  this  technique  is  that  the  measure¬ 
ment  is  performed  in  real  operating  conditions.  In  this  case, 
the  rotation  of  the  directivity  pattern  probably  due  to  an 
aeroacoustic  effect  can  be  observed.  Some  other  effects  of 
the  movement,  depending  on  the  source  speed  for  example, 
have  been  noticed.  Then,  this  method  improves  the  under¬ 
standing  of  phenomena  responsible  for  noise  generation. 
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Figure  1.  Diagrammatic  representation  of  the 
method. 


Figure  2.  PSWV  transformation  of  the  signal 
corresponding  to  the  tracking  of  the  selected 
source. 
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Figure  3.  Directivity  pattern  in  cos  0  (dB)  of  the 
simulated  source  (dotted  line)  and  directiv¬ 
ity  pattern  in  dB  measured  by  the  proposed 
method  (solid  line). 


Figure  4.  Localization  of  acoustic  sources  on 
the  train. 


Figures.  PSWV  transformation  of  the  tracking 
signal. 
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Figure  6.  Directivity  pattern  in  dB  of  the 
source  at  frequency  3000  Hz  and  frequency 
error  in  percent  during  the  tracking  of  the 
maximum  level. 
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Figure  7.  Directivity  pattern  in  dB  of  the 
source  at  frequency  3400  Hz  and  frequency 
error  in  percent  during  the  tracking  of  the 
maximum  level. 
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Abstract 


covariance  matrix 


In  this  paper  we  present  a  novel  method  for  spatial  and 
temporal  frequency  estimation  in  the  case  of  uncorrelated 
sources.  By  imposing  the  diagonal  structure  given  in  the  sig¬ 
nal  covariance  matrix,  it  is  possible  to  improve  the  perfor¬ 
mance  of  subspace  based  estimators.  The  proposed  method 
combines  ideas  from  subspace  and  covariance  matching 
methods  to  yield  a  non-iterative  frequency  estimation  algo¬ 
rithm.  In  a  numerical  example  we  show  that  the  estima¬ 
tor  has  a  lower  small  sample  resolution  threshold  than  root- 
MUSIC  and  similar  large  sample  performance. 


1.  Introduction 

Estimating  frequencies  from  uniformly  sampled  data  has 
been  an  active  research  area  for  decades.  A  number  of,  so 
called,  high  resolution  algorithms  or  eigenstructure  meth¬ 
ods  have  been  presented  and  analyzed  in  the  literature,  e.g., 
[4,  6,  7,  8].  One  disadvantage  with  these  subspace  based 
methods  is  that  it  is  difficult  to  incorporate  knowledge  of 
the  source  correlation  into  the  eigendecomposition.  In  this 
paper  we  propose  an  estimator  which  combines  ideas  from 
subspace  and  covariance  matching  methods.  The  objective 
is  to  find  a  frequency  estimator  which  uses  the  knowledge 
of  the  signal  correlation  without  significantly  increasing  the 
estimator  complexity.  In  a  numerical  example  we  show  that 
the  proposed  method  has  promising  small  sample  perfor¬ 
mance. 


R  =  A{(v)SA*{u;)-\-aH  ,  (1) 

where  d  is  the  number  of  frequencies  and  where  (v  = 
[cji ,  •  •  •  ,  In  what  follows  we  assume  that  d  is  known, 
if  unknown,  it  can  be  estimated  from  the  data  by  techniques 
described  in  [2,  9]. 

In  (1),  the  dx  d  matrix  S  denotes  the  unknown  diagonal 
signal  covariance  matrix,  is  the  unknown  noise  variance 
and  them  X  d  Vandermonde  matrix  A  (a;)  is  defined  by, 


/  1 


A(a>) 


1  \ 


(2) 


t)u;d  j 


where  m  is  the  number  of  sensors  in  the  array  processing 
case  and  the  data  window  length  in  the  temporal  case. 

In  the  spatial  frequency  estimation  problem,  the  matrix 
A  is  often  parameterized  by  the  direction  of  arrivals  (DO A) 
denoted  by  6.  For  a  linear  array  with  uniformly  spaced  ele¬ 
ments,  the  relationship  between  a;  and  6  is  given  by 


_  27rAsin(6>ife) 

=  -  (3) 

c 

where  A  is  the  element  spacing  and  c  denotes  the  speed  of 
propagation  of  the  impinging  wave,  and  where  is  mea¬ 
sured  relative  array  broadside. 


2.  Problem  Formulation 

The  well  known  problem  of  estimating  temporal  or  spa¬ 
tial  frequencies  from  uniformly  sampled  data  corrupted  by 
additive  white  noise  can  be  reduced  to  the  problem  of  de¬ 
termining  the  parameters  in  the  following  model  of  the  data 


3.  Frequency  Estimation 

The  focus  of  this  paper  is  on  how  to  estimate  the  frequen¬ 
cies  in  the  vector  cj  =  [^i,  *  ’  *  In  particular  we 

would  like  to  use  the  knowledge  that  the  signals  are  uncor¬ 
related  without  increasing  the  estimator  complexity  consid¬ 
erably. 
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The  so  called  subspace  estimation  techniques  rely  on  the 
properties  of  the  eigendecomposition  of  (1).  Let  {Aj^}  de¬ 
note  the  eigenvalues  of  R  arranged  in  descending  order,  le,, 
Ai  >  A2  >  ■  •  •  >  Am-  Since  A  is  full  rank,  due  to  its  Van¬ 
dermonde  structure,  and  S  is  positive  definite,  it  follows  that 


Ajb  >  cr^  for  fc  =  1, . . . 

Ad+i  =  •  •  •  =  Am  =  0-2  .  (4) 

The  eigenvectors  of  R  corresponding  to  {Ai,  •  •  •  ,  A^}  are 
denoted  with  {ei,  •  •  •  ,  Define 

Es  ^  [®1  >  *  *  *  j  5 

En  =  [©(i-j-i  j  *  *  *  j  ®m]  j  (^) 

A5  =  diag[Ai,  •  •  •  ,  Arf] ,  (7) 

Atx  —  diag[A(i^i  5  *  *  *  )  Ajti]  (t  I ,  (8) 

where  the  notation  diag[’]  refers  to  a  diagonal  matrix  with 
the  arguments  as  diagonal  elements.  With  the  notation  in¬ 
troduced  above  we  have 

R  =  E,A,E:+a2E„E;.  (9) 

Combining  the  two  expressions  for  R  in  (1)  and  (9)  yields 
the  following  equality 

ASA*  +  oH  =  E,A*E:  +  a^E„E;; .  (10) 

Since  E„E;  =  I  -  E,E*,  it  follows  that 

ASA*=E,AE:,  (11) 


where  A  =  A5  —  cr^I.  By  using  the  vec-operator  (d  = 
vec(D)  is  a  vector  obtained  by  stacking  the  columns  of  D) 
(11)  can  be  written  as  (vec(XYZ)  =  [Z^  (g)  X)  vec(Y)) 

(A*^  (8)  A)  vec(S)  =  (E^  ^  E J  vec(A)  (12) 

where  (g)  denotes  the  Kronecker  matrix  product  and  where 
(•)^  denotes  complex  conjugation.  Since  S  and  A  are  diag¬ 
onal  matrices  with  real- valued  entries,  there  exists  a  (cP  x  d) 
selection  matrix  L  such  that 

vec(S)  =  Ls ,  vec(A)  =  LA ,  (13) 

where  s  and  A  are  vectors  consisting  of  the  diagonal  entries 
of  S  and  A,  respectively. 

Let  R  denote  a  sample  estimate  of  the  theoretical  covari¬ 
ance  matrix,  and  let  E^  be  the  estimated  “signal  subspace” 
obtained  from  an  eigendecomposition  of  R  similar  to  (9). 
Replacing  E^  and  A  with  its  estimates  in  (12)  we  have 

B(a;)s«FA,  (14) 


where  B(a;)  =  (A^  (g)  A)  L  and  F  =  ^E^  0  E^^  L.  We 
now  suggest  to  estimate  the  unknown  parameters  by  mini¬ 
mizing  the  following  least  squares  criterion 

1|B(w)s-FA||2.  (15) 

Minimizing  with  respect  to  s  yields 

s  =  Bt(a;)FA,  (16) 

whereB^  denotes  the  pseudo-inverse  of  B.  Substituting  this 
back  into  the  criterion  we  arrive  at  the  following  criterion  for 
finding  the  frequency  estimates 

mm||P^(„)FA|p,  (17) 

where  P3  =  I  BB^  is  the  orthogonal  projector  onto 
the  null  space  of  B*.  The  criterion  (17)  is  in  general  multi¬ 
modal,  rendering  the  multidimensional  search  for  a  global 
extremum  computationally  expensive.  In  the  following  we 
will  use  the  ideas  in  [1,  5,  6]  to  rewrite  the  minimization  in 
(17)  in  a  computationally  much  more  attractive  form.  From 
the  definition  of  B  (a; )  it  follows  that  the  k*^  column  of  B  is 
given  by 

Bfc  =  [1  Zfe  •  ■  ■  1  •  ■  ■  ■  ■  • 

■  •  •  (18) 

where  Zk  =  •  Observing  the  shift  structure  in  (18)  it  is 

possible  to  parameterize  the  nullspace  of  B*  by  the  coeffi¬ 
cients  in  the  following  polynomial 

d 

9oz‘‘ +  giz^~^ - l-ffd  =  5oII  (19) 

*=1 

po  #  0. 

Define  a  full  rank  matrix  G  of  dimension  x  (m^  -  d), 
which  depends  linearly  on  the  coefficients  in  (19),  such  that 

G*B  =  0.  (20) 

This  implies  that  the  columns  of  G  constitutes  a  basis  for  the 
nullspace  ofB*  and 

Pg  =  G(G*G)-iG*=P^.  (21) 

In  order  to  illustrate  the  parameterization  a  simple  example 

is  provided. 

Example:  Assume  m  =  2  and  d  =  1,  which  implies  that 
the  matrix  B  consists  of  one  column  only.  The  polynomial 
(19)  will  in  this  case  be  given  by 

9oz  +  gi-0.  (22) 
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Using  (22)  and  the  shift  structure  in  (18),  we  can  write  (20) 
as 


G*B  = 


^1  ^0  0  0 
0  go  0  gi 
0  0  gi  go 


■  1  ■ 

2- 

z 

=  0 

z  ^ 

- 

1 

1.5- 

By  using  the  parameterization  described  above  it  follows 
that  the  criterion  (17)  can  be  reformulated  as 


min||(G*G)-i/2G*FA||2  , 


(23) 


where  the  minimization  is  over  the  polynomial  coefficients 
in  (19).  A  two-step  estimation  procedure  can  now  be  de¬ 
vised  as  follows: 


1 .  Obtain  a  consistent  estimate  of  {gk  }  by  minimizing  the 
quadratic  function  obtained  by  replacing  (G*G)“^/^ 
in  (23)  by  some  positive  definite  matrix  W. 

2.  Use  the  estimate  of  {gk}  from  step  1  to  construct  a  con¬ 
sistent  estimate  of  (G*G)'’^/^.  Insert  this  in  (23)  and 
solve  a  new  quadratic  problem.  The  frequency  esti¬ 
mates  are  then  given  by  rooting  the  polynomial  (19). 


Figure  1 .  MSE  values  for  di  versus  the  number 
of  snapshots,  N:  ’x’  -  proposed  method,  ’o’  - 
root-MUSIC.  The  dash-dotted  line  represents 
the  CRB  when  the  correlation  structure  of  the 
sources  is  known  and  the  dotted  line  is  the 
CRB  without  this  knowledge. 


It  can  be  shown  that  this  two-step  procedure  has  the  same 
large  sample  accuracy  as  the  estimates  obtained  by  minimiz¬ 
ing  (17).  The  main  advantage  is  that  we  avoid  the  non-linear 
parameter  search.  For  small  sample  scenarios  it  can  be  use¬ 
ful  to  reiterate  step  2  a  few  times  to  improve  the  accuracy. 

4.  Numerical  Example 

Here  a  numerical  example  is  provided  to  demonstrate  the 
performance  of  the  proposed  method.  We  consider  the  di¬ 
rection  of  arrival  estimation  of  two  waves  impinging  firom 
angles  9i  =  10°  and  02  =  20°  on  a  ULA  with  5  ele¬ 
ments  separated  by  a  half  wavelength.  The  uncorrelated  sig¬ 
nal  sources  are  modeled  as  white  and  complex  Gaussian  dis¬ 
tributed,  each  with  SNR  =  3dB.  The  MSE  errors  for  differ¬ 
ent  data  lengths  are  calculated  for  the  proposed  method  and 
for  root-MUSIC  [3],  each  MSE  are  based  on  200  indepen¬ 
dent  trials.  The  MSE  for  6i  is  depicted  in  Fig.  1 .  This  exam¬ 
ple  demonstrates  the  superior  performance  for  small  sample 
scenarios  compared  to  root-MUSIC. 

5.  Conclusions 

The  main  idea  in  this  paper  was  to  present  a  novel  method 
for  spatial  and  temporal  frequency  estimation  in  the  case  of 
uncorrelated  sources.  By  imposing  the  diagonal  structure 
given  in  the  signal  covariance  matrix,  it  is  possible  to  im¬ 
prove  the  performance  of  subspaced  based  estimators. 
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Abstract 

The  purpose  of  this  paper  is  the  passive  angular  lo¬ 
cation  of  the  wideband  sources  using  an  array  of  sen¬ 
sors.  The  interest  of  the  knowledge  of  the  antenna 
shape f  when  the  treatment  is  applied  on  the  received 
data,  is  illustrated  by  the  improvement  of  the  signal 
to  noise  ratio  and  by  the  increase  in  of  the  antenna 
directivity. 

In  this  paper,  the  extension  of  the  propagator  method 
is  presented:  an  algebraic  operator  is  extracted  from 
the  cross- spectral  matrices  of  the  data  or  from  the  re¬ 
ceived  signals.  This  technic  avoids  the  rather  expensive 
eigendecomposition  of  cross- spectral  matrices  at  each 
frequency  of  the  analysis  bandwidth  used  in  the  known 
methods.  The  results  of  simulations  support  the  theo¬ 
retical  predictions. 

1.  Introduction 

The  estimation  of  direction  of  arrival  (DOA)  of  multi¬ 
ple  wideband  signals  is  a  recent  problem  in  array  signal 
processing.  Many  techniques  have  been  reported  in  the  lit¬ 
erature  [1-3],  of  which  eigenstructure  methods  are  among 
the  most  established.  The  concept  of  the  signal  subspace 
processing  have  been  used  in  the  wideband  case  [1].  The 
basic  idea  is  to  use  a  coherent  signal  subspace  estimate 
obtained  by  the  eigendecomposition  of  a  frequency  do¬ 
main  combination  of  modified  narrow-band  cross-spectral 
matrix  estimates.  It  is  shown  that  the  coherent  subspace 
method  is  an  alternative  to  incoherent  subspace  method 
that  improves  the  efficiency  of  the  estimation  by  focus- 
.  ing  the  energy  of  the  analysis  bandwidth  into  a  focusing 
frequency.  Similar  technic  have  been  proposed  in  [2], 
the  originality  of  this  method  is  the  construction  of  the 
focusing  operators;  which  used  to  transform  the  signal 
subspaces.  Generally,  these  methods  have  better  perfor¬ 
mances  than  the  classical  methods  but  their  rather  expen¬ 
sive  computational  load  limits  their  implementation.  To 
avoid  this  difficult,  several  papers  [4-9]  have  been  pub¬ 
lished  in  the  aim  to  reduce  the  computational  load  for 
the  eigendecomposition  or  to  estimate  the  signal  subspace 
without  eigendecomposition.The  propagator  method  [4-6] 


is  one  of  these  methods  which  is  considered  as  an  alter¬ 
native  of  the  MUSIC  method. 

In  this  paper,  we  introduce  an  extension  of  the  propa¬ 
gator  method  for  broadband  sources.  The  transformation 
of  the  incoherent  propagators  is  performed  through  focus¬ 
ing  matrices.  The  obtained  coherent  propagator  is  used  to 
estimate  the  antenna  shape  and  the  DOA  of  the  sources. 

2.  Problem  formulation 

We  consider  an  array  of  N  sensors  which  received  the 
wavefield  generated  by  P  wideband  sources  in  the  pres¬ 
ence  of  an  additive  noise.  The  array  geometry  is  arbitrary. 
The  received  signal  vector,  in  the  frequency  domain,  is 
given  by  : 

r(/j)  =  MfjHfj)  +  “(/j) 

Where  r(/j)  is  the  Fourier  transform  of  the  array  ouput 
vector,  s(/j.)  is  a  source  vector,  n(/^.)  is  a  sensor  noise, 
and  A(/j  )  is  the  iV  x  P  transfer  matrix  of  the  source¬ 
sensor  array  systems  with  respect  to  some  chosen  refer¬ 
ence  point. 

It  is  assumed  that  the  array  is  unambiguous  and  cali¬ 
brated,  so  that  the  rank  of  A(/^  )  is  equal  to  P  for  any 
frequency.  The  sensor  noise  is  assumed  to  be  indepen¬ 
dent  of  the  source  signals  and  spatially  white  or  the  cross- 
spectral  matrix  is  known  but  for  a  scale  factor.  In  this 
case,  a  prewhitening  step  is  required  to  create  diagonal 
noise  cross-spectral  matrix.  The  sources  are  not  fully  cor¬ 
related. 

The  cross-spectral  matrix  of  the  observation  vector  at 
frequency  fj  is  given  by  : 

r(/.)=  A(/.)r,(/pA+(/.)  +  <^^(/;)i 

Where  the  superscript  represents  the  Hermitian  trans¬ 
pose.  Tsifj)  is  the  source  cross-spectral  matrix. 

Our  aim  is  to  estimate  the  angles  Oi,i  =  1,  -.jP  and 
the  antenna  shape  from  the  received  data.  In  this  paper, 
the  detection  of  the  sources  is  not  treated.  We  assume  that 
the  number  of  the  sources  P  is  known  or  can  be  estimated 
[10]. 

For  locating  the  wideband  sources  several  solutions 
have  been  proposed  in  the  literature  and  are  summarized 


0-8186-7576-4/96  $5,00  ©  1996  IEEE 


97 


as  following  : 

-The  incoherent  subspace  methods:  the  analysis  band¬ 
width  is  divided  into  several  frequency  bins  and  then  at 
each  frequency  the  treatment  is  applied  and  the  obtained 
results  are  combined  to  obtain  the  final  result. 

-  The  coherent  subspace  methods:  the  different  subspaces 
are  transformed  in  a  predefined  subsapce  using  the  focus¬ 
ing  matrices. 

For  estimating  the  antenna  shape,  the  existing  methods 
treat  the  narrowband  case  [5-11].  The  temporal  methods 
have  been  proposed  for  the  wideband  signals  but  they  have 
not  any  success,  because  they  have  low  spatial  resolution. 

3.  Narrow-band  propagator  method 


In  this  section,  we  recall  briefly  the  propagtor  method. 
We  consider  the  no  noisy  situation,  e.d.: 

r(/j)  =  MfjHfj)- 

The  direction  of  the  sources  matrix  is  partitioned  [4-6] 
in  two  block  matrices,  let : 

Wj) 

Mfj) 


LY(/i)J 

Where  X(/j)  is  a  P  x  P  matrix  and  Y(fj)  is  a  (iV  — 
P)  X  P  matrix.  We  assume  that  the  model  propagation 
vector  is  such  that  X(/j)  is  nonsingular  for  example  the 
P  first  sensors  are  linear  and  equispaced  then  X(/j)  is  a 
Vandermonde  matrix. 

The  (N-P)  last  rows  of  A{fj)  are  linearly  dependent  of 
the  P  first  rows,  we  can  write  Y(fj)  as  : 


Yifj)  =  n+(/,)x(/,) 

orn+(/,)  =  Y(/,)X-l(/,) 

The  P  X  (iV  —  P)  matrix,  n(/j),  is  called  the  propagator 
[4-6]. 

We  define  the  matrix  Q{fj)  as  : 

Q+(/,)  =  [n+(/,  )  I  -I] 

It  is  easy  to  see  that : 


Q+(/i)A(/i)  =  n+(/,)x(/,)  -  Y{fj)  =  0 

or  =  0  forp  =  and  j  =  1,  ...,M 

The  construction  of  Q{fj)  needs  the  knowledge  of  the 
directions  of  the  sources  and  the  geometry  of  the  antenna. 
For  this  we  can  not  use  directly  the  previous  result.  With 
the  former  partition  the  cross- spectral  matrix  is  : 


r(/i)  = 


[n+(/,)rai(/,) 


r(/,)  = 


TiiC/i) 


ri2(/;) 

r22(/i) 


rii(/;)  =  x(/,)rs(/,)x+(/,) 

We  have  for  example  : 


ri2(/;)  =  rii(/^)n(/,) 


it  follows,  the  estimate  of  the  propagator  is: 

Other  partitions  of  the  cross-spectral  matrix  are  given 
in  [4-6].  We  note  that  the  partition  in  [4-6]  can  be  lead  to 
the  computational  complexity  important  for  P  <<  N. 

In  a  noisy  situation,  the  optimal  propagator  is  obtained  by 
the  constrained  minimization  problem 


min 
n(/i) 

They  used  the  Frobenius  matrix  norm 


for  j  = 


4,  Extension  of  the  propagator  method  to  the 
wideband  signals 


4.1.  Incoherent  propagator 


The  analysis  bandwidth  is  divided  into  M  frequencies. 
The  narrowband  propagator  is  applied  at  each  frequency 
bin.  The  final  result  is  obtained  by  averaging  the  different 
results.  The  directions  of  the  sources  are  estimated  by 
plotting,  as  function  of  6,  the  following  measure  : 


1  1 

4.2.  Coherent  propagator 

The  transformation  matrices  are  used  at  each  frequency 
bin  such  that  we  obtain  the  focused  propagtor  at  the  center 
frequency  and,  then  all  the  transformed  propagators  are 
coherently  averaged  to  obtain  the  coherent  propagator,  i.e 

D(/i)n(/i)  =  n(/.) 

Where  D(/j)  is  the  focusing  matrix,  and  is  given 

by: 


ilifo)  =  X(/<,)fs(/o)X+(/<,) 


ri2(/o) 


with  fs(/.)  =  irEf=A-\fj)rii{fj)  (x+(/,))" 

X(/j)  is  constructed  by  using  an  initial  directions  of  the 
sources.  ri2(/o)  is  a  block  matrix  of  the  cross-spectral 
matrix  at  the  focusing  frequency  /©. 

The  transformation  matrix  is  given  by  : 

D(/,)  =  n(/,)r+2(/,)  [ri2(/i)r+2(/i)]"'rii(/,) 
Using  these  transformation  matrices,  all  the  propagator 
can  be  combined  to  find  the  focused  propagator,  in  the 
following  manner, 

n(/o)  =  iE,^iD(/i)n(/;) 

The  obtained  propagator  is,  now,  used  to  construct  the 
coherent  matrix,  given  by  : 

Q+(/,)  =  [n+(/,)  I  -I 
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We  have  :  Q+(/o)ap(/o)  =  0,  for  p  =  using 

this  result,  the  directions  of  the  sources  are  given  by  the 
values  of  6  for  which  the  function  J{0)  is  maximized, 


m  = 


1 _ 

Q+(/o)a(/c,6')  I 


for 


e£ 


TT  TT 
2’ 2 


5.  Algorithm  for  estimating  the  antenna  shape 

The  source  vectors  contain  2(A^  —  1)  unknown  param¬ 
eters  corresponding  to  {N  -  1)  modulus  and  {N  -  1) 
phases;  however,  there  are  only  2 (AT  —  1)  free  parame¬ 
ters  per  source,  which  permit  to  treat  only  one  source,  to 
overcome  this  difficult  two-step  algorithm  are  used  : 

-  In  the  first  step  the  modulus  are  eliminated  by  using 
the  coherency  matrix  of  the  received  data,  and  then  by 
using  the  conjugate  gradient  algorithm  the  phase  distribu¬ 
tion  along  the  antenna  is  estimated. 

-  In  the  second  step  the  phase  estimates  obtained  in  the 
first  step  are  introduced  in  the  cross-spectral  matrix  and 
then  modulus  can  be  estimated. 

Note  that  for  seperating  different  contributions  of  the 
phase  estimates  to  obtain  the  antenna  shape,  an  algorithm 
such  that  multidimensional  Wiener  filter  can  be  used. 


6.  Simulations  results 

In  the  following  simulations,  an  antenna  of  A/^  =  20 
equispaced  sensors  with  an  arbitrary  distorsion  compared 
to  a  linear  antenna.  The  source  signals  are  temporally 
stationary  zero-mean  bandpass  white  Gaussian  processes 
with  the  same  bandwidth  [100,  mHz].  Two  source  sig¬ 
nals  impinge  on  the  array  at  6i  =  10®  and  02  =  12® 
respectively,  with  SNR  =  ^dB.  The  analysis  bandwidth 
is  decomposed  into  M  =  32  narrowband  components  via 
FFT. 

Fig.  1  gives  the  arithmetic  mean  of  the  obtained  results 
with  the  incoherent  propagators  method  with  P  =  2. 

Fig.  2  shows  the  direction  of  arrival  of  the  sources  with 
the  coherent  propagator  described  above.  We  remark  that, 
in  the  two  cases,  the  sources  are  not  perfectely  localized, 
however  the  coherent  method  gives  a  better  results  that 
the  incoherent  method. 

Fig.  3  shows  the  estimation  of  the  antenna  shape  using 
the  proposed  method. 


Figure  1  :  DO  A  of  the  sources  using  incoherent 
method  without  antenna  correction. 


Figure  2  :  DO  A  of  the  sources  using  coherent 
method  without  antenna  correction. 


Figure  3  :  Estimation  of  the  antenna  shape. 
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7.  Conclusion 


Figure  4  :  DO  A  of  the  sources  using  incoherent 
method  after  antenna  correction. 


DOA® 

Figure  5  :  DO  A  of  the  sources  using  coherent 
method  after  antenna  correction. 

Fig.  4  and  Fig.  5  present  the  bearing  estimation  of  the 
sources  after  the  compensation  of  the  phase  due  to  the 
sensors  displaced,  these  results  show  that  after  the  antenna 
correction,  the  sources  are  exactely  localized,  however  the 
coherent  propagator  is  efficient  compared  to  the  incoherent 
treatment. 

This  numerical  example  shows  the  interest  of  the  estima¬ 
tion  of  the  antenna  shape. 


In  this  paper,  we  have  extended  the  propagator  method 
to  the  localization  of  the  wideband  signals  using  an  array 
of  sensors.  This  method  avoids  the  eigendecomposition  of 
the  cross-spectral  matrix.  It  is  based  on  the  transformation 
of  the  narrowband  propagators.  We  have  shown  that  the 
knowledge  of  the  antenna  shape  permits  the  compensation 
of  the  fluctuations  of  the  phase  along  the  antenna  which 
improves  the  localization. 
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1.  ABSTRACT 

In  this  paper  an  eigen  decomposition  technique  based 
on  cumulant  matrices  is  proposed  to  passively  local¬ 
ize  narrowband  non- Gaussian  sources  in  the  spher¬ 
ical  coordinates  viz.,  azimuth,  elevation,  range,  us¬ 
ing  signals  recorded  by  a  centro- symmetric  cross  ar¬ 
ray.  The  multiple  degrees  of  freedom  available  from 
cumulants  are  exploited  to  transform  the  near-field 
data  into  pseudo-data  collected  by  a  virtual  rectan¬ 
gular  array  observing  virtual  far-field  sources.  The 
centro- symmetric  array  structure  is  preserved  in  the 
pseudo-data  thus  allowing  efficient  real-valued  pro¬ 
cessing  via  Unitary  ESPRIT. 

2.  INTRODUCTION 

In  recent  years,  several  eigen  decomposition  algo¬ 
rithms  have  been  proposed  for  passive  source  lo¬ 
calization  using  sensor  arrays.  However,  most  ap¬ 
proaches  operate  under  the  assumption  of  far-field 
sources  and  consequently  can  only  estimate  the  az¬ 
imuth  (1-D)  or  the  azimuth  and  elevation  (2-D)  us¬ 
ing  passive  sensing  (see  for  e.g.  [4]- [7])  and  are  based 
on  the  planar  wave- front  approximation.  In  many 
a  situation,  sources  are  close  to  the  array  and  the 
inherent  curvature  of  the  waveforms  can  no  longer 
be  neglected.  Recent  works  on  near-field  source  lo¬ 
calization  concentrated  on  estimating  the  azimuth 
and  range  only.  The  algorithms  in  [2,  3]  either  in¬ 
volved  multiple  1-D  searches  of  a  2-D  MUSIC  cost 
function  or  Wigner-Ville  distributions  and  provided 
poor  resolution,  while  in  [1]  the  invariant  proper¬ 
ties  of  cumulant  matrices  were  used  for  range  and 
azimuth  estimation.  None  of  the  existing  works  ad¬ 
dress  passive  3-D  localization  which  involves  the  es¬ 
timation  of  the  spherical  coordinates,  namely  az¬ 
imuth,  elevation,  and  range.  This  paper  proposes 
a  3-D  localization  algorithm,  which  employs  cross- 
cumulants  of  signals  recorded  by  a  2-D  cross  array. 

Consider  a  near-field  scenario  in  which  co¬ 
channel,  narrowband  signals  from  L  sources  located 
at  azimuth,  elevation,  and  range  given  by  the  vector 
[ai,6i,ri],  impinge  upon  a  cross-array  aligned  with 


the  X  and  Y  axes  (Figure  1),  Although  for  sim¬ 
plicity,  it  is  assumed  that  each  of  these  branches 
consists  of  uniformly  spaced  omnidirectional  sen¬ 
sors  (with  spacing  d),  the  algorithm  is  applicable 
even  when  the  sensor  responses  are  not  identical, 
as  long  ELS  the  array  is  conjugate  centro-symmetric 
(see  e.g.  [5],  for  a  description  of  centro-symmetric 
arrays).  With  inter-sensor  spacing  d  and  m,n  G 
{-M  -  the  output  of  the 

sensor  located  at  coordinates  (md,  nd)  is  : 

L 

=  E  (1) 

/=1 

where,  rm,n(0  =  +  WyfW  -t-  <i)yin^  + 

0imn]  is  the  phase  difference  between  the  Ith 
source  signal  at  sensor  {m,  n}  and  that  recorded 
by  the  sensor  located  at  {0,0}.  The  parame¬ 
ters  are  nonlinear  functions  of 

With  0i  =  - sin^ 0; sin 2a/  : 

u!xi  =  —  sin0/  cos  a/,  Wy/  =  —  sin^/  sin  a/, 
^  (1  —  sin^  9i  cos^  a/)  ,  and 

h‘  =  ^ 

The  narrowband  signals  s/(t)  are  zero-mean,  sta¬ 
tionary,  mutually  independent,  with  non-zero 
fourth-order  cumulants,  while  the  sensor  noise 
Vm,ni't)  is  modeled  as  zero-mean,  Gaussian  and 
independent  of  s/(t).  Localization  involves  the 
estimation  of  [a/,0/,r/],  given  the  observations 
{wm,n(0}  for  t  €  {0,...,T-  1}.  The  parameter 
vectors  u/*  =  [wj;i, . . . ,  ^xl\' ^  4*x  =  ’  •  •  •  ’ 

Uy  =  [wyi,...,WyL]',  and  <jfy  =  [(pyi,  ■  ■  ■  ,<l)yL]'  are 
first  estimated  via  subspace  methods  and  then 
paired  to  yield  the  locations. 

3.  ALGORITHM  DESCRIPTION 

The  proposed  algorithm  is  implemented  in  two 
steps.  Estimation  of  {ux,<l>x}  using  the 

signals  from  X  (Y)  subarray  is  considered  in  the 
first  step.  In  Step  2,  the  elements  of  {wx,*l>x} 
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paired  with  those  in  {^y,(t>y}^  This  last  step  is 
essential  to  obtain  the  final  source  spherical  coordi¬ 
nates 

Step  1  ;  Estimation  of  and  <f)y 

From  (1),  the  signals  collected  by  the  sensors 
along  the  X  subarray  Wm,o(0  • 

L 

=  Y.  +  Vmfi{t).  (3) 

/=1 

The  model  in  (3)  coincides  with  the  signal  model 
corresponding  to  azimuth  and  range  only  (2-D)  es¬ 
timation  of  near-field  sources  with  a  1-D  array  [1]. 

In  order  to  gain  insight  into  the  possible 
source-sensor  configuration  which  gives  rise  to  the 
signal  model  in  (3),  consider  a  2-D  plane  ABCD 
containing  the  X  subarray  and  the  /th  source  as 
shown  in  the  Figure  2.  From  the  figure  we  see  that, 
as  far  as  the  X  subarray  is  concerned,  source  /  is 
at  a  distance  from  the  array  center,  at  an  an¬ 
gle  /j,i  =  sin“^  [sin  0i  cos  a/)  with  the  perpendicular 
AB^  passing  through  the  array  center  B  and  lying 
in  the  plane  ABCD.  In  [1],  a  HOS  based  solution 
that  yields  paired  estimates  of  r/  and  ///  was  pro¬ 
posed.  The  same  approach  can  be  applied  to  the 
model  in  (3)  to  obtain  and 

However,  here  we  take  a  different  view  point 
and  propose  a  solution  which  exploits  the  centro- 
symmetry  of  fourth-order  cross-cumulant  matrices. 
This  new  algorithm  is  based  on  the  observation  that 
as  a  result  of  the  nonlinear  operations  involved  in 
cumulant  computation,  the  data  collected  from  a 
1-D  linear  array  (the  X  subarray)  observing  near¬ 
field  sources  can  be  transformed  into  pseudo-data 
collected  from  a  virtual  rectangular  array  observing 
virtual  far-field  sources.  The  azimuth  and  elevation 
of  these  virtual  far-field  sources  will  turn  out  to  be 
functions  of  in  the  original  data. 

3.1  Transformation  of  Data 

Under  the  model  assumptions,  using  (3), 
and  cumulant  properties,  the  fourth-order  cross- 
cumulant  of  the  signals  from  sensors  at  {*-772,0}, 
{m,  0),  {n  —  1,0),  {n,  0}  simplify  to  ; 

C4m,n(T)  =  CUm{ul^  o(^  +  ^)- <-l  o(0- 
L 

C4s, (r)e*^ ,  (4) 

/=! 

where  c^siir)  =  cum{sj'(/ -f  r),  s?(f),  ^*(0,  s/(/)}  is 
the  fourth-order  cumulant  of  the  source  signal  si{t). 
Notice  that  the  cumulants  of  the  noise  term  2^m,o(0 
do  not  appear  owing  to  the  fact  that  cumulants  of 
order  greater  than  two  vanish  when  the  process  is 


Gaussian.  By  collecting  the  cumulants  C4^^„(r)  for 
— M  <  772,  n  <  M  the  matrix  consisting  of  cross- 
cumulants  is  obtained  as  : 

L 

C4.(r)  =  (5) 

/=! 

Note  that  the  arrangement  in  (5)  resembles 
the  response  of  a  rectangular  array  observ¬ 
ing  far-field  sources.  With  p  representing  ei¬ 
ther  ijJxi  or  the  partial  steering  vectors 

a(/7)  =  . . . ,  1,  , . . ,  are 

centro-symmetric  with  respect  to  the  array  cen¬ 
ter.  Then,  A{u)xU<j>xi)  =  is  the  ar¬ 

ray  steering  matrix  for  the  /th  source  observed  by 
a  virtual  rectangular  array  of  size  K  x  K  (where 
K  =  2M  -h  1)  with  elements  uniformly  spaced  at 
{md,  nd}  for  m,  n  E  {“M, . . . ,  M}.  Consequently, 
C4j:(r)  in  (5)  can  be  thought  of  as  the  data  col¬ 
lected  by  an  array  which  observes  virtual  far-field 
sources  with  direction  cosines  proportional  to  2ujxi 
and  ^<f>xi  • 

Instead  of  arranging  the  cumulants  in  a  rect¬ 
angular  array  we  can  collect  them  in  a  single  x  1 
vector  to  obtain  : 

L 

C4.(r)  =  (6) 

where,  ^^{(jJxU<I>xi)  =  '^^c[A{u)xu<t>xi)\  is  obtained 
by  column  stacking  elements  of  A{u)xi-,<t>xi)-  For 
the  sake  of  notational  convenience  we  henceforth 
denote  A{uJxU<i>xi)  as  A}.  Assuming  that  the 
source  cumulants  C45j(r)  are  non-zero  for  lags  r  E 
{0, 1 ...,  Tmaa;  —  l}j  vectors  C4a;(r)  are  collected  in  a 
matrix  of  size  (A'^  x  Tmax)’ 

Cx  —  [C4a:(0),  0437(1),  .  .  .  ,  C.^xiA’max  1)]  j  (7) 

SO  as  to  obtain  Tmax  “snapshots”  from  the  virtual 
rectangular  array.  Each  vector  C437(r)  belongs  to 
«S,  the  signal  subspace  spanned  by  the  virtual  array 
steering  vectors  <^>37/)}^^.  Thus,  the  cu¬ 

mulant  based  preprocessing  maps  the  2-D  near-field 
azimuth-range  estimation  (using  a  linear  array)  to 
an  equivalent  2-D  far-field  azimuth- elevation  prob¬ 
lem  (arising  from  a  rectangular  array). 

3.2  2-D  Unitary  ESPRIT 

As  mentioned  earlier,  several  algorithms 
have  been  proposed  to  solve  this  2-D  far-field  prob¬ 
lem  (see  for  e.g.,  [5]  -  [7]).  From  (6),  we  observe  that 
the  problem  is  two-fold  :  (i)  estimation  of  the  model 
parameters  0JxU<l>xi  and  (ii)  pairing  the  parameters. 
We  adopt  the  principle  behind  the  Unitary  ESPRIT 
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algorithm  in  [5]  since  it  not  only  results  in  efficient 
real-valued  processing  but  also  automatically  yields 
paired  estimates  of  the  model  parameters.  Owing 
to  the  symmetry  in  the  cumulant  definition  of  (4), 
the  centro-symmetric  property  of  the  original  ar¬ 
ray  carries  over  to  the  new  virtual  rectangular  array 
Ai-  The  description  of  the  algorithm  in  the  sequel 
closely  follows  that  in  [5]  but  with  pseudo-data  Cx 
instead  of  the  original  data. 

The  partial  steering  vectors  a{uxi)  and 
satisfy  the  following  invariance  relationship 

e'''’Jia(p)  =  J2a(/c>),  for  p  =  or  (f>xi-  (8) 


With  I(if_i)  denoting  an  identity  matrix  of 
size  {K  -  1),  the  selection  matrices  Ji  = 
[I(K-i)  0(K-i)xi]  and  J2  =  [0(k-i)xi 
lect  the  first  and  last  {K  -  1)  rows  of  a(p).  Let  us 
next  define  a  unitary  matrix  with  conjugate  centro- 
symmetric  rows  as  follows  : 


Im  0  Am 
0'  V2  P' 

Hm  0  -jTIjvf  . 


(9) 


In  (9),  TIm  is  the  permutation  matrix  with  ones 
on  the  anti-diagonal.  It  can  be  shown  that  Qk 
transforms  conjugate  centro-symmetric  vectors  into 
real- valued  vectors.  Thus,  the  real-valued  manifold 
corresponding  to  Ai  is 


d,  =  vec  [Q^A,Q*k]  ■  (10) 


Consequently,  a  real-valued  signal  subspace  can  be 

generated  by  the  columns  of  D  =  [di, . . . ,  d^].  Let 
E,  be  the  orthonormal  basis  for  this  subspace.  Con¬ 
sequently,  there  exists  a  real,  non-singular  T  such 
that  E,  =  DT.  Using  (8),  the  following  invariance 
relations  can  be  shown  to  hold  [5]  : 


=  K^2E,,  K^iE,$^  =  K^2E,  (11) 

where,  $0;  =  T“^fiuiT,  ^4,  =  T  (12) 

and  Koji  =  Ik  ®  E!w2  =  Ik  ®  K21  = 

Ki(g)lKi  ^<t>2  =  K2®Ik,  Ki  =  {Qk-i’^sQk }  - 
K2  =  Im  {Q^_]J2Qk}  • 

Equations  (11)  and  (12)  are  similar  to  those 
that  show  up  in  the  classical  ESPRIT.  We  can 
solve  for  and  9  4,  via  the  TLS  solution  in 
the  presence  of  estimation  errors  which  arise  when 
finite  data  are  used  in  practice.  The  matrices 
=  diag[tan(wii), . . . ,  tan(wj,i)]  and  ft,/,  = 
diag[tan((^j;i), . . . ,  tan((^xL)]  are  thus  obtained  as 
the  eigen  values  of  and  ^4,  respectively. 

The  two  real-valued  eigen  decompositions 
in  (12)  can  be  replaced  by  the  following  complex 


valued  eigen  decomposition  which  also  yields  an  au¬ 
tomatic  pairing  of  {w®,  4>j.} 

+iII<^)T-  (13) 

Thus,  the  (th  parameter  pair  can  be  ob¬ 

tained  from  the  {real,  imaginary}  parts  of  the  /th 
eigen  value  of  $  in  (13). 

The  basis  E,  needed  in  the  preceding  re¬ 
lations,  can  be  obtained  from  the  transformed 
data  Cxr  =(Qk®Qk)Cx  which  can  be  shown 
to  lie  in  the  subspace  spanned  by  D.  Thus  E* 
can  be  extracted  as  the  L  left  singular  vectors  of 
[Re(Cxi-)  Im(Cxr)]  corresponding  to  the  L  largest 
singular  values,  provided  L  <  min{A'‘  +  1}. 

The  signal  model  for  the  Y  subarray  data  is 
similar  to  (3)  with  replaced  by 

Hence,  we  can  obtain  the  paired  estimates  of  the 
parameters  {(^yAy)  by  following  the  same  steps 
but  using  Cy  instead  of  Cx-  Since  we  can  only 
obtain  the  paired-parameter  estimates  with  a  un¬ 
known  permutation^  ambiguity,  we  denote  this  as  ; 
and  {uy,^y}. 

Step  2  :  Pairing  of  and  {uy,^y} 

In  computing  the  location  coordinates 
[a;,  0(,  n]  for  the  sources,  the  source  parameter  from 
the  {u>®,  0®}  set  has  to  be  paired  with  the  right  one 
from  the  set  so  that  the  nonlinear  equa¬ 

tions  in  (2)  can  be  solved.  There  are  L!  possible 
pairings  for  the  L  sources.  Let 

u(t)  =  [u_M-l,o(t)j  •  •  • )  'no,-M-i(Oi 

. . . ,  uo-i(t),  uo,i(t),  •  ■  • ,  uo,m(^)]^  (14) 

denote  the  (2A'  -t-  1)  x  1  data  vector  obtained  by 
stacking  the  signals  collected  from  the  X  and  Y  sub¬ 
array.  Let  B(2K+i)xi  be  the  corresponding  array 
steering  matrix,  s(t)Lxi  the  source  vector  and  v(t) 
the  noise  vector,  then  the  matrix  form  of  (1)  is  : 

u(t)  =  Bs(t)  +  v(<).  (15) 

For  each  combination  {w®p,  <l>xpt^ygi '/’!/?}» Ibe  pos¬ 
sible”  array  steering  matrix  Bp  is  constructed  and 
the  model  mismatch  error  Cp , 

(16) 

t 

is  evaluated  where  B^  is  the  projection  matrix  onto 
the  null  space  of  Bp.  The  combination  which  min¬ 
imizes  ep  is  then  the  correct  pairing.  In  contrast 
to  the  least-squares  estimation  of  the  source  coordi¬ 
nates  directly  from  the  data,  the  minimization  here 
is  only  over  a  parameter  space  of  size  L\  (finite  set). 
Finally,  the  spherical  coordinates  of  the  sources  are 
obtained  from  {w® ,  </)® ,  }  via  (2). 
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4.  SIMULATION  RESULTS 


Example  1:  The  algorithm  is  tested  with  two  non- 
Gaussian  sources  at  [a,6^r]  =  [45°, 45°,  1.5A],  and 
[—20°,  10°,  2A]  and  with  2Af +2  —  4  elements  in  each 
of  the  subarrays.  The  source  signals  are  generated 
as  BPSK  signals  filtered  with  first-order  Butter- 
worth  filter  with  cutoff  frequency  of  0.47r.  The  sen¬ 
sor  noise  is  circular  white  Gaussian  (SNR—  30dB). 
The  cumulant  matrices  are  estimated  from  T  = 
2048  data  samples  {Tmax  =  5).  Figure  3  shows 
the  sensor  array  configuration,  with  thick  circles 
representing  the  sensor  locations.  The  actual  di¬ 
rections  of  arrival  and  ranges  as  seen  by  the  ref¬ 
erence  sensor  and  the  estimates  obtained  from  the 
proposed  method  (result  of  500  trials)  are  also  in¬ 
dicated.  The  estimates  (mean  ±  std.  dev.)  are: 
[di,^i,7q]  =  [44.76°  d=  0.6,44.48°  zb  1.65, 1.54A  ± 
0.23]  and  [d2,02,r2]  ^  [-17.55°  ±  6.66, 10.19°  d: 
0.67,  2.03Adb0.31]  confirm  the  superior  performance 
of  the  proposed  approach. 

Example  2:  The  algorithm  is  tested  on  a  second 
set  of  sources  at  [q,  (9,r]  —  [20°,  45°,  1.5A],  and 
[20°,  10°,  2 A]  respectively  (same  azimuth).  The 
estimates  [di,^i,fi]  ~  [20.19°  zb  1.68,44.69°  dz 
1.53,1.67A  ±  0.66]  and  [03,^2, 7*2]  =  [18.7°  zb 
7.11, 10.41°±  1.18, 12. lA zb 0.55]  are  close  to  the  true 
values  (Figure  4). 

Example  3:  As  a  final  example  we  consider  two 
sources  arriving  at  the  same  elevation  angle.  Table  1 
shows  the  true  parameters  along  with  the  estimates 
for  two  sample  sizes.  As  expected,  the  estimation 
variance  decreases  with  larger  sample  size. 


Table  1 


Source  1 

0^1 

w, - 

r^ 

True 

45° 

20'’ 

1.5A 

Est  :(T==204S) 

■  47.6  ±0.8 

i9.4±2.1 

1.58A±0.19 

Est  :(T=4096) 

46.7  ±7.3 

T9.5±  1.9 

T:5BA±0.18 

Source  2 

0^2 

ro 

True 

20^’ 

1.5A 

Est  ;(T  =  2048) 

-9.1  ±2.8 

■'21.7  ±4.5 

1.03A±0.58 

Est  :('T=4096) 

-9.4  ±2.2 

21.3±  3.8 

1.91A±0.31 
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Figure  1.  Near-field  scenario. 


Figure  2.  Source  as  seen  by  the  X  subarray. 
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Abstract 

This  paper  extends  earlier  results  by  Ward,  Kennedy  and 
Williamson  [1,2]  for  the  design  of  broadband  arrays  with 
frequency-invariant  (FI)  beam  patterns  to  the  case  where 
it  is  desired  to  place  an  exact  null  in  a  given  direction. 
The  beamforming  is  done  using  appropriately  selected  FIR 
filters. 

First,  the  previous  results  for  generating  FI  beam  patterns 
using  FIR  filters  are  briefly  summarised.  Second,  new  results 
which  give  the  conditions  required  for  exact  nulls  in  the 
beam  pattern  for  all  frequencies  in  any,  possibly  non-FI, 
beam  pattern  are  given.  Third,  a  method  of  generating  beam 
patterns  which  possess  an  exact  null  and  which  are  close,  in 
an  L2  sense,  to  an  arbitrary  FI  pattern  is  presented.  Finally, 
some  preliminary  experimental  results  which  corroborate 
the  theoretical  findings  are  presented. 

1.  Problems  Addressed 

Consider  an  array  of  A'  spatially  separated  omni¬ 
directional  microphones.  The  array  has  a  nominal  aperture 
of  P  half-wavelengths  at  a  given  frequency.  The  signals 
from  each  sensor  are  sampled  at  sampling  frequency  fs  and 
are  filtered  using  I-tap  finite  impulse  response  filters  with 
frequency  responses 

Hnif)  ■■=  E  n=l,2,...,N. 

m=0 

This  is  illustrated  in  Figure  1. 

We  wish  to  select  the  filter  coefficients,  hn[m],  and  the 
sensor  locations,  Xn,  so  that  the  farfield  array  response  from 
direction  0 


A(0,f)  :=  ^iJn(/)exp  ^j27r/^n::n  sin^^  =  h^d{0J) 


I  The  authors  wish  to  acknowledge  the  funding  of  the  activities  of  the 
Cooperative  Research  Centre  for  Robust  and  Adaptive  Systems  by  the 
Australian  Commonwealth  Government  under  the  Cooperative  Research 
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Figure  1.  The  array  geometry  assumed. 


possesses  certain  properties  over  the  frequency  range  /  G 
[fi^fu]-  The  velocity  of  wave  propagation  is  denoted  c. 
The  iVL-dimensional  vector  of  FIR  coefficients  is 

=  [/m  [0] . . .  hN[0]  ...hi[L-l]...  -  1]] 


and 


^j2rrfTi{0) 


(0) 


d{0J)  = 


^,?27r/[ri(6l)-L+ll 


g.?27r/[r,Y(^)-L+l] 


is  the  A^L-dimensional  delay  vector  with 


•=  sin  0. 
c 

The  property  of  frequency  invariance  has  been  investi¬ 
gated  previously  [1—3].  This  paper  examines  obtaining  exact 
nulls  in  a  beam  pattern  and  the  interaction  of  this  property 
with  frequency  invariance. 
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1.1.  Problem  One:  FI 


Proof.  From  (1),  the  array  response  in  direction  6^  is 


Suppose  it  is  desired  that 

A^^{ej)  =  A{e)  fe[fL,fu\. 

It  was  shown  in  [  1  ]  how  to  choose  the  Xn  locations  and  that, 
when  chosen  appropriately,  the  array  filters  should  have  the 
dilation  property 


Kif)  =  K,  (^f) 

where  H^if)  is  the  filter  response  of  the  n}^  filter  and 
primary  filter  response  at  some  reference 

location  a:,ef. 

Ward,  Kennedy  and  Williamson  [2]  considered  two  pos¬ 
sible  filter  bank  implementations  with  this  property:  a  multi¬ 
rate  approach  and  a  single  rate  approach.  This  paper  follows 
the  single  rate  approach  with  the  primary  filter  coefficients 
given  by 


1  /  \ 
[m]sinc  (—-k) 

where  jn  =  The  secondary  filter  coefficients  are  cho- 

^rcl 

sen  so  that  H^{f)  is  a  differentiator  over  /  E  [/l?  fu]^ 

The  full  frequency  invariant  shading  filters  are  then 


=  3nh^n['^l]  * 

where  *  denotes  convolution  in  the  m  index  and  gn  is  a 
spatial  weighting  term  to  account  for  the  (possibly)  nonlinear 
array  spacing. 

1.2.  Problem  Two:  END 


Aieoj) 


(4) 


N 


=  ^H„(/)exp  ( ji’27r/— sin^o 

n=l  ^  ^ 

N 

n—\ 

N  L-1 

=  EE 

n=l  m—0 


(5) 


L-l 


=  E 


m=0  Ln=l 


m  * 


sin  (7r[m  +  rn(go)]) 
71- [m  +  r„(0o)] 


-jlnfm 


(6) 


Equation  (5)  yields  (2)  and  the  inverse  discrete  Fourier  trans¬ 
form  of  (6)  gives  (3).  H 

It  is  not  immediately  clear  how  (2)  and  (3)  may  be  easily 
enforced.  The  following  result  shows  this. 

Corollary  1  — Integer  Delay  Property 
is  an  integer  then 

N 

^ /j.„[m  +  r„(^o)]  =  0,  Vm  (7) 

n—\ 

is  a  sufficient  condition  for  (3)  being  satisfied,  □ 

Proof.  If  Tn(Oo)  E  Z  then  (3)  becomes 

N 

hn[m]  *  S[m  +  r„((?o)]  =  0,  Vm 

n—\ 

where  6  is  the  Kronecker  delta.  I 

Remark  1 :  Note  that  it  is  always  possible  to  place  a  null 
at  Oq  —  O  because  in  this  case 


The  first  new  results  of  this  paper  are  Proposition  1  and 
its  Corollary,  which  are  the  conditions  for  Exact  Null  Design 
(END). 

Proposition  1  —  Condition  for  a  Broadband  Null 
A  broadband  null  at  Oo  will  be  available  if  and  only  if  either 


:=  =  q,  V/  (2) 

n=l 


or,  equivalently, 


E 


sin  (7r[m  +  ^n(^o)]) 
+  r„(0o)] 


=  0,  Vm 


(3) 


where  *  again  denotes  convolution  in  the  m  index.  □ 


Tn 


fs 


Xn  sin  Oq  =  0,  Vn 


and  (3)  reduces  to  requiring 


=  V/, 


n=0 


□ 


Using  this  idea,  the  END  condition  for  a  null  at  broadside 
may  be  written  as 

c'^h^o 


where 


C  :=  II  ®  ^N 


where  II  is  the  L  x  L  identity  matrix  and  ijv  is  the  .V- vector 
of  ones. 
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1.3.  Problem  Three:  FI  with  END 

Designing  a  frequency  invariant  array  generates  array  fil¬ 
ters  .  Placing  an  exact  null  imposes  the  underdetermined 
constraints  (7).  The  question  of  whether  it  is  possible  to  do 
both  arises,  taking  into  account  the  slack  of  the  exact  null 
constraints. 

The  approach  we  take  is  similar  to  that  of  Frost  [4].  As¬ 
sume  we  time-delay  beamsteer  the  FI  beamformer  to  direc¬ 
tion  -^0-  The  effect  of  this  is  to  move  the  null  to  broadside. 
In  the  remainder,  we  will  use  the  tilde  symbol  O  to  indicate 
a  steered  quantity. 

The  steered  filter  coefficient  A' I-vectors,  with  END  con¬ 
ditions  imposed,  are  given  by  hpiEND  —  hfi  +  he  where 
he  are  the  deviations  from  hpi  which  allow  for  exact  null 
design. 

We  approach  the  problem  by  imposing  the  exact  null 
constraints  (7)  while  minimising  the  cost  functional 


.7  =  \A^Ue,  /)  -  /)p  dfde 

i-TT  JSl 

-  \fh  d{0,  f)\^dfde  =  dK  (8) 

7~7r  JJl 

where 

£»:=  d{ej)d^{ej)dfd6. 

J-n  J/l 


The  best  he  is  then  found  as  the  solution  to  the  optimisa¬ 
tion  problem 

min  Dhe 

K 


subject  to  ^  0.  The  solution  to  this  problem 


IS 


hT 


D~^C 


D-^C  a 


where  a  —C'^hpi, 

The  matrix  D  is  of  full  rank,  provided  no  two  sensor 
locations  coincide;  C  is  also  of  full  rank.  This  solution 
has  the  same  form  as  that  presented  in  [4].  The  optimum 
unsteered  response  is  then 


afiend(oj)  ^  +  hTfdiej) 


/*7'[o]  ■ 

- 1] 

h°^'[L  -  1  -  r,(0o)] 

hf[L  -  1]  . 

.  hf[L  -  1  -  r,v(^o)]  . 

2.  Example  Array  Design 

The  figures  show  the  resulting  array  responses  for  the 
following  array  parameters: 

/l  =  lOOOHz,  fu  =  2000Hz,  N  =  7. 

The  primary  filters  in  the  FI  design  are  chosen  to  be  5  taps, 
the  secondary  filters  are  3  taps  long.  The  END  design  re¬ 
quired  a  null  to  be  placed  at  ^  =  7r/4  —  45°.  The  array  was 
36cm  long. 

Because  the  array  was  to  be  tested  in  a  small  anechoic 
chamber,  the  farfield  design  methodology  presented  here 
was  modified  to  allow  for  a  nearfield  design.  For  the  results 
shown,  the  source  was  2.8  metres  from  the  array.  Space 
precludes  inclusion  of  the  derivations  for  the  nearfield  case 
[5].  For  an  alternative  technique,  see  [6,7]. 

Figures  2,  3  and  4  show  array  responses  at  20  regularly 
spaced  frequencies  between  lOOOHz  and  2000Hz  for  the 
original  FI  design  [2],  an  END  design  with  no  account  taken 
of  frequency  invariance  and  the  FIEND  design  where  filters 
are  adjusted  to  cater  for  the  null  while  minimising  the  cost 
function  (8). 

Remark  2:  Clearly  the  FIEND  responses  more  closely 
resemble  the  FI-only  response  than  does  the  END  response; 
the  value  of  J  for  the  END  response  plotted  is  0.0487  and 
the  value  of  J  for  the  FIEND  response  is  0. 0114.  □ 

Time  constraints  precluded  measurement  of  the  array  re¬ 
sponses  of  all  designs;  only  the  END  design  was  tested 
empirically. 

Good  correspondence  between  theoretical  and  measured 
results  were  obtained  for  frequencies  995Hz,  1248Hz, 
1505Hz,  1748Hz  and  2004Hz  are  displayed  in  Figure  5. 
Some  problems  were  encountered  with  the  response  mea¬ 
sured  at  1748Hz. 

3.  Conclusions 

We  have  presented  one  new  result  which  allows  exact 
nulls  to  be  incorporated  into  any  broadband  array  design. 
If  frequency  invariance  is  required,  another  new  result,  pre¬ 
sented  here,  allows  exact  nulls  while  minimising  a  mean 
square  error  cost  between  a  frequency  invariant  design  and 
the  design  which  includes  a  null. 

An  exact  null  design  was  tested  in  the  laboratory;  theo¬ 
retical  and  measured  responses  compared  favourably. 
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Figure  2.  Array  responses  at  various  frequen¬ 
cies  for  FI  array. 


Figure  3.  Array  responses  at  various  frequen 
cies  for  END  array. 
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Nearfield  FIB  with  FIEND 


Figure  4.  Array  responses  at  various  frequen¬ 
cies  for  FIEND  array. 


Frequency:  995Hz  Frequency:  1249Hz 


Figure  5.  Predicted  ( — )  and  measured  (x)  ar¬ 
ray  responses  at  various  frequencies  for  END 
array. 
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Abstract 

When  the  signal  to  noise  ratio  is  relatively  high,  the 
angle  of  arrival  of  the  strongest  signal  can  be  estimated 
with  a  very  simple  method  and  a  small  3-D  sensor  array. 
The  differences  in  the  arrival  times  of  the  wideband  signal 
received  by  spatially  separated  sensors  are  estimated 
using  the  polarity  coincidence  correlation.  These  time  dif¬ 
ferences,  i.e.  time  delays,  determine  the  angle  of  arrival. 
In  this  paper  the  effects  of  quantization  of  the  time  delays 
are  studied.  It  is  found  out  that  this  simple  method  gives 
comparable  performance  to  the  conventional,  direct  cor¬ 
relation  based  methods  in  the  case  of  a  relatively  high  sig¬ 
nal  to  noise  ratio. 

1.  Introduction 

In  this  paper  a  low-complexity  method  for  the  estima¬ 
tion  of  the  angle  of  arrival  of  a  wideband  signal  with  a 
small  array  is  developed.  The  lower  the  complexity  of  the 
angle-of-arrival  estimation  method  is,  the  simpler,  the  less 
expensive,  and  the  more  reliable  is  the  hardware  with 
which  the  method  can  be  realized.  A  small  array  was  set  as 
the  goal  in  order  to  make  possible  a  less  expensive  array 
which  could  even  be  used  as  a  part  of  portable  equipment. 

A  small  sensor  array  is  defined  here  as  follows:  The 
number  of  sensors  is  less  than  8  and  the  maximum  dis¬ 
tance  between  any  two  sensors  is  less  than  20*7^*c,  where 
Tg  is  the  sampling  interval  and  c  is  the  propagation  veloc¬ 
ity  of  the  signal.  There  are  no  other  restrictions  to  the  loca¬ 
tions  of  the  sensors,  so  they  can  form  a  three-dimensional 
structure. 

In  order  to  achieve  a  low-complexity  method  only  the 
angle  of  arrival  of  the  signal  from  the  strongest  signal 
source  is  estimated.  The  differences  in  arrival  times  of  a 
wideband  signal  received  by  spatially  separated  sensors 
can  be  estimated  one  by  one  if  the  sensor  array  is  receiving 
one  dominating  wave  propagating  signal. 

This  work  was  sponsored  by  Nokia  Foundation. 
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When  the  signal  to  noise  ratio  is  relatively  high  (e.g.  5 
dB  in  the  case  of  a  one  signal  in  white  gaussian  noise)  time 
delays  with  which  the  signal  reaches  different  sensors  can 
be  estimated  using  polarity  coincidence  correlation  [1], 
meaning  that  signals  are  quantized  to  1-bit  representation. 
The  signal  to  noise  ratio  needed  to  achieve  a  certain  vari¬ 
ance  of  the  time  delay  estimate  depends  on  the  signal  and 
noise  spectrums,  cross-correlations  of  the  signal  and  noise 
at  different  sensors,  cross-correlations  of  noise  at  different 
sensors,  the  length  of  the  estimation  window,  and  the  sam¬ 
pling  rate.  The  variances  of  the  time  delay  estimates  are 
given  in  closed-form  expressions  for  general  signal  and 
noise  spectra  in  [2]. 

The  main  advantage  of  the  polarity  coincidence  corre¬ 
lation  is  the  possibility  to  use  simple  1-bit  quantization  of 
signals.  This  results  that  simpler  analog  automatic  gain 
control  can  be  utilized  when  compared  to  multibit  quanti¬ 
zation.  Polarity  coincidence  correlation  is  also  computa¬ 
tionally  much  simpler  and  less  demanding  than  direct 
cross-correlation  methods  because  it  can  be  implemented 
without  multiplications  [3].  Also  the  handling  of  1-bit  sig¬ 
nals  requires  considerably  less  memory  than  that  of  multi¬ 
bit  signals. 

The  disadvantage  of  using  1-bit  quantization  is  that 
interpolation  of  the  signal  values  between  sampled  values 
isnT  possible.  Therefore  the  time  delays  can  be  estimated 
only  with  the  accuracy  of  one  sampling  interval.  The  error 
of  the  time  delay  estimates  caused  by  rounding  to  the  near¬ 
est  multiple  of  the  sampling  interval  causes  errors  to  the 
estimate  of  the  propagation  vector.  This  kind  of  error, 
which  hasnT  been  considered  earlier,  is  studied  in  this 
paper.  The  study  is  based  on  modeling  the  rounding  error 
of  the  time  delay  estimates  by  independent  white  noise 
which  is  uniformly  distributed  in  the  interval  [-TJ2,  TJ2). 

2.  Method 

In  principle  the  introduced  method  for  the  angle  of 
arrival  estimation  is  as  follows:  First,  signals  received  by 
spatially  separated  sensors  are  quantized  to  1-bit  represen- 


0-8186-7576-4/96  $5.00  ©  1996  IEEE 


109 


T  f  T 


(4) 


Figure  1.  Block  diagram  of  the  angle  of  arrival 
estimation  method 


tation.  Then,  the  time  delays  between  signals  in  different 
sensors  are  estimated  by  polarity  coincidence  correlation. 
Finally,  these  time  delays  are  used  to  determine  the  angle 
of  arrival.  The  block  diagram  of  the  angle  of  arrival  esti¬ 
mation  method  is  presented  in  Figure  1. 

Let  us  assume  that  the  sensor  array  is  receiving  a  wave 
propagating  signal  ^  (f,x)  caused  by  a  distant  event,  t  is 
the  time  and  x  is  a  three-dimensional  vector  representing  a 
location  in  an  orthogonal  coordinate  system.  It  is  assumed 
that  s(t,x)  can  be  modeled  as  a  sum  of  plane  waves  with 
common  direction  of  propagation  but  with  different  fre¬ 
quency  and  amplitude, 

(1) 

where  j  is  the  imaginary  unit,  co^  is  the  frequency,  A;  is  the 

amplitude  of  the  /th  component  of  the  wideband  signal, 
and  A:  is  a  propagation  vector  which  determines  the  direc¬ 
tion  and  the  velocity  of  propagation  of  the  plane  wave,  ||A:|| 
=  1/c.  T  denotes  matrix  transpose. 

Let  the  signal  received  by  the  /th  sensor  located  at  be 

y.{t)  =  s{t,x.)  +  w.(0  (2) 

where  Wi(t)  is  the  noise  component  received  by  /th  sensor. 
If  sensors  in  the  array  are  identical,  then  in  the  ideal  noise- 
free  case  the  only  difference  between  signals  received  by 
different  sensors  is  the  time  delay  because  5  (r,  jt)  is  a  sum 
of  plane  waves  with  a  common  direction  of  propagation. 

The  time  delay  between  the  signals  received  by  the  /th 
and  the  mth  sensor  is 

=  A™.  (3) 

where  is  called  a  sensor  vector.  The  propagation  vector 
k,  which  gives  the  angle  of  arrival,  is  determined  by  three 
time  delays  z.  ,  n  =  1,2,3  if  corresponding  sensor  vec¬ 
tors  X.  ^  are  linearly  independent.  In  general,  k  is  the 

least  squares  solution  of  the  matrix  equation  =  V^k 

(where  M  is  the  number  of  the  time  delay  estimates  used  in 
the  estimation  of  A),  i.e. 


where 


1-1 


» 


V  =  lx.  X.  X.  V 

t  =  U  T  T  f 


(5) 


provided  that  the  rank  of  the  matrix  is  3. 

Time  delays  are  estimated  using  polarity  coincidence 
correlation:  d.^  is  taken  as  an  estimate  of  the  time  delay 


1-^  if  it  maximizes  the  sum 


'  (6) 

n 

The  estimated  delays  d.^  can  only  get  values  which  are 

multiple  of  sampling  interval  because  interpolation  of  the 
signals  quantized  to  1-bit  representation  is  not  possible. 
The  error  of  the  time  delay  estimate  caused  by  rounding  to 
the  nearest  multiple  of  sampling  interval  can  be  modeled 
as  white  noise  which  is  uniformly  distributed  in  the  inter¬ 
val  [-TJ2,TJ2).  The  error  is  assumed  to  be  statistically 

independent  of  the  time  delay  to  be  rounded  and  other 
time  delays. 

If  all  pairs  of  the  vectors  x.  ^  contained  in  the  matrix 

are  linearly  independent,  meaning  that  no  pair  of  vec¬ 
tors  have  equal  direction,  then  the  error  of  the  time  delay 
can  also  be  assumed  to  be  statistically  independent  of  the 
errors  of  the  other  time  delays.  In  this  case  the  covariance 
matrix  of  the  rounding  error  of  the  estimate  of  the  time 
delay  vector  is 

cov{A^^}  =  ^/,  (7) 

where  I  is  the  identity  matrix  and 

-  p/,m,  •••  •  (8) 

The  rounding  of  the  time  delays  to  the  nearest  multiple  of 
sampling  interval  causes  errors  to  the  estimate  of  propaga¬ 
tion  vector  k.  The  covariance  matrix  of  that  kind  of  error 
can  now  be  estimated  because  k  is  calculated  by  matrix 
multiplication  of  the  time  delay  vector, 

cov{Ak^}  =  ,  (9) 

where 

■  (10) 

The  more  time  delay  estimates  are  taken  into  account  in 
the  calculation  of  the  estimate  of  the  propagation  vector  k 
the  smaller  are  the  variances  of  the  error  of  the  compo¬ 
nents  of  k^ . 
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The  computational  load  of  the  proposed  angle-of- 
arrival  method  depends  on  the  complexity  of  the  estima¬ 
tion  of  the  cross-correlation  functions,  the  maximum 
allowed  time  delay  and  the  number  of  the  time  delay 
estimates  M  used.  When  nonoverlapping  estimation  win¬ 
dows  and  polarity  coincidence  correlation  are  used,  one 
conditional  counter  increment  per  sample  is  needed  to 
estimate  one  value  of  the  cross-correlation  function.  When 
conventional  direct  correlation  method  is  used,  one  multi¬ 
plication  and  one  addition  per  sample  are  needed. 
(2^max+l)  estimated  values  of  the  cross-correlation  func¬ 
tion  are  needed  for  each  of  the  M  time  delay  estimate  used, 
where 

=  (11) 

\  S  J 

After  estimating  the  cross-correlation  functions  their  max¬ 
imal  values  are  found  out  to  estimate  the  time  delays. 
Finally  the  propagation  vector  k  which  determines  the 
angle  of  arrival  is  calculated  by  a  multiplication  of  3xM 
matrix  and  a  vector  of  length  M,  see  equation  (10). 

3.  Simulation  Results 

A  grid  of  four  sensors  with  equal  distance  20  •T^*c 

between  all  sensors  (see  Figure  2)  is  used  as  the  array  in 
the  simulations. 

Distribution  of  the  rounding  errors  of  the  time  delay 
estimates  was  verified  by  simulations.  Randomly  gener¬ 
ated  vectors  k{n)  ,  n  =  1,  2, ...,  100  000  were  used  as  a 
test  sequence.  The  vectors  k{n)  were  generated  as  fol¬ 
lows:  First  three-dimensional  vectors  were  generated, 
whose  components  were  independent  and  uniformly  dis¬ 
tributed  in  the  interval  [-1,1].  Then  the  vectors  were  scaled 


Figure  3.  Histogram  of  rounding  errors  of  time 
delays 


tiple  of  the  sampling  interval.  Then  the  error  between  the 
original  and  rounded  time  delays  was  estimated.  The  his¬ 
togram  of  the  error  values  is  shown  in  Figure  3.  The 
assumption  of  uniform  distribution  holds  very  well. 

The  error  of  the  estimate  of  k  caused  by  the  rounding  of 
the  time  delays  to  the  nearest  multiple  of  the  sampling 
interval  were  formed  by  using  3  and  6  rounded  time  delay 
estimates  in  calculating  the  estimate  of  k.  In  the  case  of  3 
time  delays 

^3  ”  [^12  ^13  **^14]  ^3  ”  [^12  '*'13 

were  used. 

The  sample  covariance  matrices  and  the  covariance 
matrices  of  the  errors  M3  and  M^  formed  with  equations 

(9)  and  (10)  are  shown  in  Table  1.  The  simulated  values 
are  quite  close  to  the  values  given  by  the  equation  (9).  It  is 
noticed  by  comparing  the  covariance  matrices  of  the  errors 
M3  and  Ak^  that  in  this  case  the  use  of  6  time  delay  esti¬ 
mates  instead  of  3  time  delay  estimates  halved  the  vari¬ 
ance  of  the  error  of  the  components  of  the  estimated 
propagation  vector. 

The  proposed  angle-of-arrival  method  was  then  simu¬ 
lated.  The  test  signal  used  was  a  sum  of  5  sinusoids  with 
frequencies  of  0.027c,  0.067c,  O.Itc,  0.267c,  and  0.467c  with 
respect  to  the  sampling  frequency  27C  and  with  a  common 
amplitude.  The  test  signal  was  assumed  to  propagate  as  a 
plane  wave.  The  same  signal  but  propagating  as  a  plane 
wave  to  a  different  direction  was  used  as  an  interfering 
signal.  The  power  of  the  interfering  signal,  v^(n) ,  was 


so  that  the  norm  of  each  vector  was  1/c.  The  true  values  of 
all  the  six  possible  time  delays  were  formed  with  the  equa¬ 
tion 

t^=y(k>  (12) 

where 

^6  ”  [^12 -*^13  *^14  ^23  *^24  **^3^ 

^  3,  .  (13) 

^6  ”  \jl2  '^14  '^23  ^^24  '’^34] 

After  that  the  time  delays  were  rounded  to  the  nearest  mul- 


increasing  during  simulation, 

Vj(n)  =  10  v^,  (15) 

where  is  the  power  of  the  test  signal.  The  signals 
received  by  the  sensors  were  the  sum  of  the  test  signal,  the 
interfering  signal  and  white  independent  gaussian  noise 
with  variance  O.lv^.  All  the  six  possible  time  delays  were 
estimated  using  polarity  coincidence  correlation  presented 
in  equation  (6)  with  nonoverlapping  estimation  windows 
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sample  covariance 

(divided  by  ) 

covariance 
(divided  by  l0"^(jj 

2 

) 

A*3 

0.2080  -0.1207  -0.0848 

-0.1207  0.3490  -0.0482 
-0.0848  -0.0482  0.3809_ 

0.2083  -0.1203  -0.085ll 
-0.1203  0.3472  -0.0491 
-0.0851  -0.0491  0.3819] 

Ak, 

0.1043  0.0001  -0.0003 

0.0001  0.1046  0.0001 
-0.0003  0.0001  0.1046_ 

0.1042  0.0000  0.0000 

0.0000  0.1042  0.0000 
p.OOOO  0.0000  0.1042_ 

Table  1 .  Covariance  matrices  of  the  error  of  the 
estimated  propagation  vectors 


with  a  length  of  10"^  samples.  The  length  of  the  test  signal 
was  10^  samples.  The  estimate  of  the  propagation  vector  ifc 
was  formed  with  equation  (10).  The  components  of  the 
vector  ck^  as  a  function  of  time  are  presented  in  Figure  4. 

In  this  simulation  the  method  tracks  the  signal  from  the 
strongest  signal  source  when  the  power  of  the  stronger  sig¬ 
nal  is  about  2.5  times  the  power  of  the  weaker  signal.  The 
results  achieved  by  using  direct  correlation  instead  of 
polarity  coincidence  correlation,  i.e.  using 

(16) 

n 

instead  of  equation  (6),  are  also  presented  in  Figure  4.  The 
proposed  method  gives  comparable  performance  to  the 
conventional  correlation  based  method. 

4.  Conclusions 

A  low-complexity  method  for  estimation  of  the  angle 
of  arrival  of  the  signal  from  the  strongest  signal  source 
was  introduced.  The  method  is  based  on  the  differences  in 
the  arrival  times  of  the  signal  at  different  sensors.  These 
differences  are  estimated  using  polarity  coincidence  corre¬ 
lation  and  as  a  consequence,  1-bit  quantization  can  be 
used.  Because  of  this  the  required  amount  of  calculation  is 
significantly  reduced  when  compared  to  conventional 
methods  without  noticeable  differences  in  the  performance 
when  the  signal  to  noise  ratio  is  relatively  high.  The  intro¬ 
duced  method  also  makes  possible  quite  complex  3-D  sen¬ 
sor  placements  if  necessary.  These  characteristics  make 
the  introduced  method  very  easy  and  cheap  to  implement, 
and  robust  to  operate.  Therefore  the  method  is  suitable  for 
low-cost  applications,  where  it  is  sufficient  to  find  out  only 
the  angle  of  arrival  of  the  strongest  signal  source. 


vectors  as  a  function  of  time,  circles:  estimated 
with  polarity  coincidence  correlation;  crosses: 
estimated  with  direct  correlation;  dashed  line: 
the  propagation  vector  of  the  test  signai;  soiid 
line:  the  propagation  vector  of  the  interfering 
signal 
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Abstract 

Performance  analysis  shows  the  asymptotic  optimality 
of  the  MUSIC  technique  applied  to  bearing  estimation 
problems  for  a  sufficiently  large  number  of  sensors  and 
not  fully-coherent  sources  (see  for  instance  [I,  2]).  This 
implies  that  a  large  number  of  covariance  lags  has  to  be 
computed;  moreover,  the  computational  load  of  the  eigen- 
decomposition  of  large  covariance  matrices  may  be  too 
severe  for  practical  applications. 

With  reference  to  uniformly  spaced  linear  arrays 
(ULA*s),  in  this  paper  we  show  that  the  accuracy  gain 
associated  to  an  increased  number  of  sensors  can  be  al¬ 
ternatively  obtained  by  applying  the  MUSIC  technique  to 
particular  configurations  of  pairs  of  ULA  *s,  referenced  to 
as  subarrays,  using  a  significantly  smaller  number  of  sen¬ 
sors. 

It  is  also  shown  that  the  accuracy  loss  of  the  proposed 
method,  w.r.t.  a  full  ULA  covering  the  same  array  aperture, 
can  be  minimized  by  varying  the  distance  between  the  two 
subarrays. 

The  provided  simulation  results  shows  the  applicability 
of  the  proposed  method. 


1  Introduction 

The  classical  problem  in  array  signal  processing  is  to  es¬ 
timate  the  directions  of  arrival  (DOA)  of  plane  waves  with 
an  array  of  sensors.  Among  others,  the  MUSIC  technique 
[4]  has  became  popular  due  to  its  simple  formulation,  easy 
implementation  and  high  statistical  efficiency. 

Moreover,  performance  analysis  (see  for  instance  [1,2]) 
shows  the  asymptotic  optimality  of  the  MUSIC  technique 
for  a  sufficiently  large  number  of  sensors  and  not  fully- 
coherent  sources.  On  the  other  hand,  a  larger  array  implies 
the  computation  of  more  covariance  lags  and  a  more  heavy 
computational  load  of  the  eigen-decomposition. 


Looking  at  the  application  of  high-resolution  DOA  esti¬ 
mation  methods  based  on  subspace  decomposition,  such  as 
MUSIC,  ESPRIT,  etc.,  to  reduced  covariance  matrices,  we 
analize  here  a  sensors  arrangements  in  pairs  of  ULA’s;  each 
ULA  will  be  referred  to  as  a  subarray.  We  will  show  that 
the  accuracy  of  the  MUSIC  technique  is  basically  saved 
when  the  two  subarrays  are  optimally  displaced  each  other, 
having  significantly  reduced  the  number  of  array  elements. 

This  feature  is  substantially  due  to  the  fact  that  the  array 
manifold  associated  to  the  proposed  configurations  of  pairs 
of  subarrays  retains  the  same  slope  near  the  intersections 
with  the  signal  subspace,  when  the  two  subarrays  are  opti¬ 
mally  displaced.  In  fact,  the  loss  of  the  estimation  variance 
is  limited  by  the  effective  aperture  enlargement. 

To  demonstrate  this  circumstance,  we  employ  the  ana¬ 
lytical  expression  of  the  estimation  variance  given  in  [1], 
particularized  to  the  specific  array  manifold,  as  a  function 
of  the  subarrays  splitting  distance.  Moreover,  by  following 
the  guidelines  indicated  in  [2],  we  evaluate  the  angular  dis¬ 
tance  between  the  estimated  and  the  true  signal-subspaces. 
Interestingly,  the  splitting  distance  yielding  the  minumum 
angular  distance  between  the  estimated  and  the  true  signal- 
subspaces  does  not  coincides  with  the  splitting  distance 
yielding  the  minimum  estimation  variance.  Following  the 
idea  presented  in  [3],  where  a  linear  prediction  method  is 
applied  to  a  pair  of  subarrays  where  the  reference  subarray 
is  constituted  by  a  single  sensor,  we  show  that  the  optimum 
splitting  distance  is  still  obtained  by  using  a  generalized 
minimum  prediction  error  variance  criterion  also  when  the 
reference  subarray  is  formed  by  more  than  one  sensor. 

Finally,  simulation  results  are  provided  to  show  the  ap¬ 
plicability  of  the  MUSIC  to  the  proposed  configuration  of 
pairs  of  subarrays. 

2  Performance  Analysis  of  MUSIC  applied 
to  subarrays 

For  reference,  let  us  refer  to  a  ULA  of  M  sensors  spaced 
d  meters  apart.  A  general  configuration  of  a  pair  of  sub- 
arrays  consists  in  forming  the  first  subarray  with  the  first 
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Ki  sensors  and  the  second  subarray  with  the  last  K2  sen¬ 
sors  at  the  other  endpoint.  Let  us  pose  =  The 

subarrays  distance  is  X>  =  A  •  d where 

In  other  words,  the  subarray  configuration  is  obtained  by 
powering  off  A— 1  sensors  in  the  middle  of  an  M  sensors 
ULA,  so  as  to  mantain  the  overall  array  aperture  (M  —  l)d. 
The  manifold  associated  to  this  sensors  configuration  is 
described  by  the  steering  vector 


aH  = 


l,e- 


,3^ 


—  A— l)u;  ^  ^  ^  — 


where  u  —  27rdsin(0)/A,  6  is  the  generic  DOA^and  A  is 
the  wavelength. 

When  L  sources  are  considered,  the  matrix  of  the  steer¬ 
ing  vectors  (sometimes  referred  to  as  the  mixing  matrix) 
is^ 


A  =  Kwi),---,  3(0;^)]  (1) 

and  the  vector  of  the  array  sensors  measurements 
x=  [xi  ,*  •  otki  ,•  •  •,  is  given  by 

X  =  A  •  f  w  (2) 

where  f  =  [/i  f^Y  is  the  vector  of  L  <  M  indepen¬ 
dent,  zero-mean,  circularly  complex,  Gaussian  distributed 
sources,  and  w  =  [wi  ,•  •  *,  w^Y  is  the  vector  of  observa¬ 
tion  noises,  circularly  complex,  zero-mean  and  Gaussian 
distributed,  independent  of  the  sources  f.  Naturally,  the 
observation  model  (2)  is  formally  equivalent  to  the  obser¬ 
vation  model  drawn  from  the  full  array  configuration;  the 
mixing  matrix  A  takes  into  account  the  actual  form  of  the 
array  manifold,  i.e.  how  the  sensors  are  located  along  the 
receiver. 

The  main  advantage  of  using  the  subarrays  configuration 
consists  in  mantaining  the  same  end-to-end  array  aperture 
{M  —  l)d  while  using  K  sensors  instead  of  M,  so  allowing 
for  a  significant  computational  saving.  This  is  paid  with 
an  estimate  accuracy  loss. 

To  evaluate  the  accuracy  of  the  estimation  carried  out  by 
using  the  MUSIC  technique  to  the  subarrays  configuration, 
we  start  by  recalling  here  the  (approximate)  expression  of 
the  mean  squared  value  of  the  distance  between  the  signal 
subspace,  Le,  the  range  of  the  mixing  matrix  A,  and  an 
estimate  of  the  signal  subspace  obtained  from  the  eigen- 
decomposition  of  the  sample  covariance  matrix 

i=l 

^  In  the  sequel,  also  the  parameter  u)  will  be,  improperly,  referred  to  as 
DOA. 

^Strictly  speaking,  the  matrix  A  is  parameterized  by  the  vector 
uj  =  [tji  ,•  •  •,  h  should  be  denoted  as  A(aJ).  To  avoid  a 

too  cumbersome  notation,  in  the  sequel  we  will  omit  the  dependence  on 
uj,  writing  simply  A. 


where  N  is  the  number  of  available  independent  snapshots 
x(i).^ 

Said  a  the  angle  between  the  subspaces,  we  report 
here  the  following  espression  of  the  mean  squared  value 
E{(1  —cosa)^}  of  the  subspaces  distance,  drawn  from 
[6]  (with  some  rearrangements  of  terms),  in  the  case  of 
two  uncorrelated  sources  and  white  noises 


E{(1  —  cosa)^}  ~ 
'l  +  K-  SNR 


1  2{K  -  2)  1 


N  K'^ 
(l  +  s) 


(1  +  x)^ 


+ 


SNR2 

1  +  K-  SNR(1 


■x) 


(l-x)^ 


(3) 


where  SNR  =  (Pi  +  P2)/2a^  is  the  signal-to-noise  ratio 
for  sources  with  power  Pi  and  P2  and  white  Gaussian  noise 
with  variance  cr^. 

The  parameter  x  depends  on  the  array  manifold  as  fol¬ 
lows: 


X  =  ^{i-A)  +  A-\(j)\  (4) 

where  A~APiP2/{Pi  +  ^2)^  depends  only  on  the  source 
powers  and  0  =  a”(a;i)*a(a;2)/if  is  the  (normalized)  scalar 
product  of  the  steering  vectors  a(a;)  evaluated  at  the  true 
DOA’s  (jJi  and  032,  spaced  6(jj~{uj2—(jJ\)/2  apart. 

The  correlation  coefficient  (j)  relates  the  angular  close¬ 
ness  of  the  sources  and  the  form  of  the  array  manifold,  and 
it  assumes  the  following  form 

s  - - 1 - 

K  sin  (^o;) 

■  (sin  {Ki6uj)  +  sin 

The  absolute  minima  of  E  |(1  —  cosa)^}  are  obtained^ for 
^  =  0,  Le.  by  choosing  an  array  manifold  in  which  the 
steering  vectors  associated  to  the  sources  are  orthogonal. 
Despite  of  this  fact,  good  signal  subspace  estimation  does 
not  results  in  a  minimization  of  the  estimation  variance. 
This  is  due  to  the  fact  that  the  definition  of  the  angular 
distance  between  subspaces  relies  on  the  maximum  angle 
formed  by  vectors  belonging  to  the  subspaces,  while  the 
estimation  variance  also  depends  on  the  local  slope  of  the 
array  manifold  measured  at  the  intersection  with  the  signal- 
subspace;  sensible  rotation  of  the  estimated  subspace  is 
allowed,  leaving  the  estimation  variance  quite  unaffected. 

This  fact  is  also  deduced  by  looking  at  the  expression 
of  the  mean  squared  estimation  error,  reported  in  [1], 


(6) 


^This  is  due  to  the  increasing  monotonic  behaviour  of  (3)  in  the  interval 
|<^|  <  1  or,  equivalently,  1  —  A  <x  <1. 
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where 

dM  =  — a(a;) 

is  the  vector  of  the  derivatives  of  the  components  of  the 
steering  vector,  the  vectors  are  the  eigenvectors  of  the 
true  covariance  matrix  spanning  the  signal-subspace,  Xk 
are  the  associated  eigenvalues,  and  the  vectors  gk  are  the 
eigenvectors  spanning  the  noise-subspace,  Le, 

L  K-L 

R;,  =  E{x  •  x"}  =  ^  Afc  •  Sfe  •  gfc  .  g« 

*=1  k=l 


a  linear  prediction  framework.  A  similar  approach  has  been 
also  employed  in  [5]  in  speech  compression  applications. 
Specifically,  denoting  by 

Xl  =  [xi  ,•  •  •,  XkJ  ;  X2  =  [Xki^A  y  • 

the  measurements  drawn  from  the  first  subarray  and  the 
second  subarray,  respectively,  the  linear  prediction  problem 
is  solved  by  determining  the  matrix  P  which  minimizes  the 
the  sum  of  the  variances  of  the  prediction  errors 

e  =  X2  -  P  •  xi  (9) 


For  two  uncorrelated  sources  the  expression  of  the  eigen¬ 
values  is 


Ai,2  =  j  ((Pi  +  P2)  ±  ^(Pl-P2)2+4PlP2p)+a2 
and  (non-normalized)  eigenvectors  Sk  are 

si,2  =  a(a;i)  +  ^i|^^-a(a;2)  (7) 


By  carefully  looking  at  (6),  we  see  that  the  numerator 
depends  on  (j)  in  the  same  way  of  (3),  but  the  dependence 
of  the  denominator  is  not  still  clear.  To  better  investigate 
the  dependence  on  (/> ,  (6)  has  to  be  put  in  a  more  suitable 
form. 

To  this  end,  let  us  consider  two  sources  having  the  same 
power,  I.  e.  Pi  =  P2  =  P-  After  some  algebraic  manipula¬ 
tions,  we  obtain 


(l-|.^|^)-hNSR/ii: 


d 


OSlj" 


(8) 


where  NSR  =  1/SNR,  overbar  denotes  complex  conjuga¬ 
tion,  and 

K  ■I=l  +  A  +  9  +  ---  +  {Ki-l)^ 

+  (iTi  +  A  -  1)2  +  •  ■  ■  -f  (M  -  1)2 
K  -J  =l  +  2  +  S  +  ---  +  {Ki-l) 

4-  {Ki  -1-  A  —  1)  -I-  •  •  •  -f  (M  —  1) 


We  can  see  that  now  0  =  0  maximizes  the  numerator,  while 
the  denominator  depends  on  the  derivative  of  0  w.r.t.  the 
DOA  spacing  6u>.  It  is  shown  that  the  subspaces  distance 
criterion  cannot  be  used  to  determine  a  splitting  distance 
which  minimizes  the  estimation  error  . 

To  this  purpose,  we  follow  here  an  alternative  approach 
based  on  the  minimization  of  the  variance  of  the  prediction 
error,  as  indicated  in  [3]  where  the  particular  case  of  a 
subarray  formed  by  one  sensor  only  has  been  addressed  in 


The  minimum  value  is  readily  found  as 

=^  E  {e«e}  =  Trace  {R2  -  RliRr^R2i }  (10) 

where  Ri  =  E{xi  x”}  is  the  covariance  matrix  of  the 
measurements  of  the  first  subarray,  R2  =  E{x2  -x^}  is 
the  covariance  matrix  of  the  measurements  of  the  second 
subarray  and  R21  =  E{x2  Xi}  is  the  cross-Kiovariance 
matrix  between  the  two  subarrays.  The  absolute  minumum 
of  crl  is  found  by  sistematically  varying  A. 

3  Conclusion 

To  show  how  the  results  of  the  previous  section  can 
be  used  in  practical  applications,  simulation  results  are  re¬ 
ported  in  fig.l,  where  the  mean  squared  errors  relative  to 
the  Root-MUSIC  estimation  of  the  DOA  cvi  =  —UJ2  ==  3  de¬ 
grees,  when  a  subarray  configuration  with  iiTi  =  ^2  =  5  is 
considered,  is  plotted  vs.  the  subarrays  splitting  distance 
A.  For  comparison  purposes,  the  results  relative  to  the  full 
array  with  M  =  ifi-j-^2+A-l  sensors  are  also  shown. 
The  array  spacing  is  half  a  wavelength  2d=A,  three  values 
of  SNR  =  ”5,  7,  15  dB  are  considered,  the  number  of 
snapshots  is  AT  =  100  and  averaging  over  100  MonteCarlo 
runs  have  been  carried  out. 

We  see  that  optimum  performances  of  the  subarrays  con¬ 
figuration  are  obtained  for  A  15  for  all  the  SNR  values 
and  the  given  source  spacing  6(j  =  &  degrees,  correspond¬ 
ing  to  performance  of  a  full  array  of  M  =  24  sensors,  while 
only  K=10  sensors  are  used  in  the  subarrays  configura¬ 
tion. 

In  figs.  2  and  3,  we  report  the  minimum  error  prediction 
variance  (10)  and  the  subspace  distance  (3)  as  functions 
of  the  subarrays  splitting  distance  A,  respectively.  We  see 
that  the  optimum  splitting  distance  is  correctly  indicated  by 
the  minimum  prediction  error  criterion,  while  the  subspace 
distance  shows  a  kind  of  a  opposition-of-phase  behaviour 
in  indicating  the  optimum  splitting  distancet.  This  is  due 
to  the  lack  of  dependence  on  the  derivative  of  0  w.r.t.  the 
DOA  spacing  6uj,  which  is  inherently  taken  into  account 
in  the  minimum  prediction  error  criterion. 
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This  fact  does  suggests  some  opportunities  of  practical 
use  of  the  ideas  here  presented.  For  instance,  this  compu¬ 
tational  saving  technique  can  be  used  in  tracking  pairs  of 
sources  after  a  full  array  discovery  stage  by  applying  the 
following  two  step  procedure: 

•  fixed  the  subarrays  aperture  Ki  and  K2,  search  over  A 
for  the  minimum  prediction  error.  This  implies  inver¬ 
sion  of  a  reduced  order  covariance  matrix  Ri,  which 
can  be  suitably  carried  out  using  Levinson  recursion 
in  the  case  of  a  uniformly  linearly  spaced  subarray  xi; 

•  apply  the  root-MUSIC  techniques  (or  other  techniques 
based  on  subspaces  decompositions)  to  the  KxK  co- 
variance  matrix  of  the  subarrays  configuration. 

The  drawback  is  constituted  by  the  increased  ambiguity 
of  the  subarrays  configuration,  which  is  clearly  show  in 
fig.l  for  A  large  enough.  In  essence,  this  technique  can  be 
defined  as  a  MUSIC  interpherometric  method. 

This  technique  can  be  used  also  in  multiple  invariance 
ESPRIT  context.  This  is  matter  of  current  investigation, 
along  with  a  more  detailed  theoretical  analysis. 
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Figure  1:  Comparison  of  performance  of 
MUSIC  applied  to  a  full  array  (continuous 
curves)  and  to  a  subarrays  configuration 
(dashed  curves)  as  a  function  of  the  splitting 
distance  A  and  for  various  SNR  values. 


Figure  2:  Error  prediction  variance  vs,  the 
splitting  distance. 


Figure  3:  MSE  of  the  distance  between  the 
true  and  the  estimated  signal  subspaces  vs. 
the  splitting  distance. 
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Abstract 

In  many  implementations  of  digital  delay  and  sum  beam- 
forming,  a  sample  rate  much  higher  than  the  Nyquist  rate 
is  used.  This  allows  for  many  synchronous  beamsteering 
directions.  Severe  demands  are  made  upon  the  analogue  to 
digital  converters  however.  Several  methods  have  been  pro¬ 
posed  for  reducing  the  sample  rate  required.  These  methods 
incorporate  the  delays  that  are  needed  for  beanfomung  in 
time  domain  [3], [4 ]  or  infrequency  domain  [5].  A  more  effi¬ 
cient  method  for  implementing  a  time-domain  delay  and  sum 
beanformer  using  polyphase  decomposition  is  presented  in 
this  paper.  This  method  results  in  significant  computational 
savings  when  the  desired  angular  resolution  is  high  com¬ 
pared  to  the  number  of  sensors  used  and  the  number  of 
simultaneously  formed  beams. 


1.  Introduction 

Conventional  continuous-time  beamformers  delay  all 
sensor  outputs  so  that  propagation  delays  are  cancelled  and 
the  sensor  ouQ)uts  can  be  combined  coherently.  In  a  discrete¬ 
time  beamformer,  these  delays  are  performed  digitally.  Us¬ 
ing  discrete  time  delays  only  allows  for  delaying  ovct  an 
integer  multiple  of  the  sampling  time  period.  Therefore,  the 
number  of  synchronous  beam-pointing  directions  is  small 
for  low  sample  rates,  resulting  in  a  poor  angular  resolution. 
To  illustrate  this,  it  is  shown  in  Section  2  that  a  linear  ar¬ 
ray  sampled  at  v  times  the  Nyquist  rate  can  only  be  steered 
to  1  -1-  2v  synchronous  angles.  A  signal  arriving  firom  a 
non-synchronous  direction  can  be  received  by  steering  the 
beam  to  the  most  nearby  synchronous  angle  or  by  rounding 
the  delays  needed  for  beamforming  to  the  delays  available. 
Both  methods  introduce  severe  distortion  and  poor  spatial 
discrimination  for  small  v.  In  Section  3  the  concept  of  inter¬ 
polation  beamforming  is  discussed.  This  technique  uses  in¬ 
terpolation,  so  that  the  sampling  rate  is  increased  artificially. 
In  this  way,  delays  are  obtained  which  are  a  firaction  of  the 
unit  delay  [2] .  In  Section  4  an  efficient  method  is  presented 


for  the  implementation  of  the  interpolation  beamformer  us¬ 
ing  polyphase  decomposition.  The  resulting  complexity  is 
discussed  in  Section  S  and  a  numaical  example  shown  in 
Section  6. 

2.  Linear  Sensor  Array  Beamforming 

Although  the  method  to  be  presented  can  be  applied  to  all 
array  geometries,  an  example  of  the  steering  c^abilities  of 
a  linear  array  is  discussed.  Fbr  a  linear  array  the  anticipated 
propagation  delay  of  a  flat  wave  plane  from  the  first  sensor 
to  the  i*'*  sensor  equals 

idsin^  .... 

rtnax  —  tv  = - >  1^1 

c 

where  d  is  the  sensor  int^-distance,  0  is  the  direction  of 
arrival  (DOA)  relative  to  broadside  (the  direction  perpen¬ 
dicular  to  the  line-array),  c  is  the  wave  plane  propagation 
velocity  and  rmax  is  the  propagation  delay  of  the  wave  plane 
between  the  first  and  the  last  sensor.  The  wave  plane  incident 
to  the  linear  array  is  depicted  in  Figure  1.  The  sensor  outputs 


Figure  1.  Wave  front  incident  to  a  linear  array 

are  sampled  at  the  rate  and  consequently  the  beam 

can  only  be  steoed  to  the  angles  which  yield  a  difference 
in  propagation  delay  of  uT,  seconds  between  neighboring 
sensors,  with  u  an  integer.  This  delay  is  cancelled  using 


0-8186-7576-4/96  $5.00  ©  1996  IEEE 


117 


time-delays  uT^  =  The  beam  can  thus  be  steered  to 

9„  =  sin-'  .  (2) 

A  common  choice  for  d  that  prevents  spatial  aliasing  is 
d  =  Ao  being  the  minimum  wavelength  of  the  signal 
to  be  received.  For  this  sensor  inter-distance  and  a  sample 
rate  equaling  f,  =  Ivfo,  it  follows  from  (2)  that  |«|  <  v. 
TTiere  exist  1  +  2[t;J  different  u  that  obey  this  equation, 
so  the  beam  can  be  steered  to  1  -t-  2\y\  different  angles. 
For  example,  when  the  sample  rate  for  equals  4/o  (v  =  2), 
it  follows  from  (2)  that  the  beam  can  only  be  steered  to 
0,  ±30®  and  ±  90®.  A  higher  angular  resolution  can  be 
achieved  by  interpolating  the  sampled  data.  When  v  =  2 
and  more  than  5  synchronous  directions  are  desired,  the 
data  can  be  interpolated  by  a  factor  M.  This  is  depicted  in 
Figure  2  for  M  =  4.  The  solid  lines  indicate  the  sampled 
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This  filter  is  a  Finite  Impulse  Resp)onse  filler  with  impulse 
response  h[k'T^].  The  are  delayed  over  piT',  seconds,  to 
compensate  for  the  anticipated  propagation  delays,  where 
the  PiT',  are  equal  to  the  r,-.  Then  down-sampling  is  used  to 
obtain 

x'i[kT,]  =  Xi[{kM-pi)T',].  (5) 

The  beamformer  output  is  obtained  by  sununing  the 
shaded  a;J.  Shading  means  multiplying  the  a;J  with  weights 
to  enhance  the  angular  discrimination.  To  simplify  notation, 
the  shading  is  not  mentioned  explicitly  in  figures  and  equa¬ 
tions.  Multiple  beams  can  be  formed  from  the  interpolated 
sensor  outputs  i,-  without  performing  additional  multiplica¬ 
tions. 

Pridham  and  Mucci  [3]  argued  that  the  scheme  of  Figure 
3  is  equal  to  the  scheme  in  Figure  4  for  the  case  that  only 
one  beam  is  formed.  This  can  be  seen  by  interchanging 
the  filter  H  and  the  delays  p,r'  and  placing  the  filter  H 
and  the  down-sampling  in  Figure  3  after  the  summation. 
This  is  allowed  when  all  filters  are  identical,  linear  and  time 
invariant.  Furthermore,  the  filter  and  the  down-sampling 
may  be  combined  to  reduce  complexity.  It  will  be  shown 
in  Section  5  that  the  complexity  of  the  technique  proposed 
in  the  following  section  is  lower  than  that  of  the  scheme  in 
Figure  4  for  high  angular  precision  beamformers.  Note  that 
forming  multiple  beams  is  not  possible  without  performing 
additional  multiplications  with  this  scheme. 


Figure  2.  Data  interpolation 

data  and  the  dashed  lines  indicate  the  interpolated  data  A 
delay  of  ^  for  example  can  now  be  achieved  by  selecting 
the  interpolated  samples  indicated  with  an  f  in  Figure  2. 
Interpolating  with  a  factor  M  =  4  now  allows  for  beam¬ 
steering  to  0,  ±7.18®,  ±14.5®,  ±22.0®,  ±30.0®,  ±38.7®, 
±48.6®,  ±60.0®,  ±90.0®.  Clearly,  only  one  of  every  M 
interpolated  samples  is  used  for  beamforming. 

3.  Interpolation  Beamforming 


4.  Polyphase  Decomposition 

In  this  section  an  efficient  implementation  of  the  inter¬ 
polation  beamformer  is  presented.  First,  consider  the  data 
processing  in  Figure  3  for  i''*  sensor  only.  This  is  depicted 
in  Figure  5(a).  Here  the  delay  piT',  is  interchanged  with 
interpolation  filter  H.  This  is  allowed  since  H  is  linear  and 
time  invariant.  In  Figure  5(b)  filter  H  is  decomposed  into 
the  filters  Hq,  Hi  ...  Hm-i  using  polyphase  decomposition 
[1].  The  impulse  responses  of  the  Hj  can  be  calculated  from 
the  impulse  response  of  H  according  to 


The  interpolation  process  for  a  single  beam  is  depicted 
in  Figure  3.  First  the  sensor  data  is  sampled  at  a  rate  equal 
to  or  exceeding  the  Nyquist  sampling  rate.  The  i*'*  sampled 
sensor  output  is  zero  padded  to  obtain  x,  [jb'T'] 


XiWZ]  = 


folk'  =  0,±M,±2M,... 
otherwise 


with  T'  =  ^.  Then  the  5,-  are  filtered  with  the  interpolation 
filter  H  to  obtain  the  £,• 


L-l 


Hk'Ti]-^J2h[lTi]xi[{k'-l)T;]. 


(4) 


/=0 


0, 


fork'  6  K 
otherwise 


(6) 


fori  =  0,l,...,M-land«  =  {0,M,...,M 
Down-sampling  these  filter  outputs  is  equivalent  to  down- 
sampling  the  data  and  then  filtering  with  Hj,  as  depicted  in 
Figure  5(c).  The  impulse  responses  of  the  Hj  are  given  by 

h'j[kT,]  =  hj[kMTi],  (7) 

for  I;  =  0, 1 , ... ,  •  Up-sampling  with  a  factor  M, 

delaying  over  p,Tj  and  down-sampling  with  a  factor  M  is 
equal  to  delaying  over  j^T,  if  p,-  is  an  integer  multiple  of 
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M,  and  equal  to  zero  otherwise.  Therefore  Figure  5(c)  can 
be  interpreted  as  choosing  the  filter  H'^  and  delaying  over 
rTj,  where 


,  =  m\^]-p,. 

_  9+Pi  _  mi 

M  ImI  ■ 


(8) 


The  resulting  scheme  is  depicted  Figure  5(d).  The  combi¬ 
nation  of  the  delay  of  rT,  and  the  sub-filter  H'^  represents 
an  approximation  of  the  desired  delay  PiT^.  The  sub-filter 
must  be  of  sufficient  length  to  guarantee  a  good  approxima¬ 
tion  of  the  desired  delay.  In  contradiction  to  this  demand, 
a  long  sub-filter  requires  many  multiplications  per  second, 
and  introduces  a  long  beamformer  delay. 


5.  Comparison  of  Computational  Complexity 


is  most  efficient  when  M  <  2N  and  N  >  Nb-  Thus  the 
proposed  method  outperforms  its  altranatives  when  a  high 
angular  resolution  is  required  (M  >  2N  and  M  >  2Nb)- 
This  scheme  has  a  gain  in  computational  complexity  over 
its  alternatives  of  ^  and  ^  respectively.  The  quality  of 
the  delays  formed  depends  on  the  sub-filte:  length  w  = 
When  both  the  filter  length  L  and  the  int^polation  fac¬ 
tor  M  are  increased  with  the  same  factor,  the  number  of 
beam-pointing  directions  further  increases  while  the  com¬ 
putational  complexity  does  not  increase  for  the  proposed 
method.  Consequently,  the  angles  for  which  beams  can  be 
formed  can  now  be  chosen  with  arbitrarily  precision  while 
maintaining  the  same  amount  of  multiplications  per  second. 
In  practice,  the  filter  length  L  is  limited  however  since  f  tab 
weights  are  stored  into  a  finite  amount  of  memory.  For  the 
two  alternative  methods  the  computational  complexity  does 
however  increase  proportionally  with  the  filter  length  L. 


As  a  measure  of  complexity,  the  number  of  multipli¬ 
cations  per  seconds  of  the  interpolating  filter  is  considered. 
The  filter  H  is  assumed  to  be  of  length  L  =  wM  throughout 
this  section,  with  w  integer.  Although  this  is  not  necessary,  it 
gives  more  insight  in  the  calculation  of  the  complexity.  The 
proposed  beamformer  is  compared  with  the  beamformers  of 
Figure  3  and  4. 

In  the  scheme  of  Figure  3,  N  filters  of  length  L  are  cal¬ 
culated  at  a  rate  with  N  the  number  of  sensors.  Using 
/'  =  Mf,  and  taking  advantage  of  the  sparse  input  data  of 
the  interpolation  filters,  this  yields  a  complexity  of  LNf, 
multiplications  per  second.  This  complexity  is  independent 
of  the  number  of  beams  to  be  calculated,  and  is  therefore 
efficient  when  a  large  number  of  beams  is  required.  Fur¬ 
thermore,  assuming  that  the  filter  is  a  linear  phase  filter, 
the  number  of  multiplications  per  second  can  be  reduced 
by  approximately  a  factor  2,  yielding  a  complexity  of 
multiplications  per  second. 

Filter  H  in  Figure  4  is  calculated  for  each  beam  at  a  rate 
/,,  since  only  one  of  every  M  samples  is  needed.  The  num¬ 
ber  of  multiplications  per  second  equals  LNb/s,  with  Nb 
the  number  of  beams  to  be  calculated.  Again,  assuming  that 
the  interpolation  filter  is  a  linear  phase  filter,  the  resulting 
number  of  multiplications  per  second  equals  • 

The  scheme  in  Figure  4  is  more  efficient  than  the  scheme 
in  Figure  3  if  and  only  if  the  number  of  beams  to  be  calcu¬ 
lated  is  smaller  than  the  number  of  sensors  {Nb  <  N). 

In  the  proposed  beamformer,  sub-filters  are  of  length 
w.  For  each  beam,  N  sub-filters  are  calculated  at  a  rate 
/j.  The  total  number  of  multiplications  per  second  equals 
wNNBfs  =  In  general,  it  is  not  possible  to 

exploit  the  linear  phase  property  of  the  interpolation  filter 
to  reduce  the  complexity  further.  When  M  <  2Nb  and 
N  <  Nb  interpolating  all  sensor  data  as  in  Figure  3  is  most 
efficient.  Combining  all  interpolation  filters  as  in  Figure  4 


6.  Example  of  Computational  Complexity 

Next,  an  example  is  given  to  show  that  the  conditions  for 
the  proposed  meffiod  to  be  more  efficient  than  its  alterna¬ 
tives  are  easily  met.  Consider  sub-filters  of  length  w  =  10, 
AT  =  7  sensors,  M  =  20  (41  different  synchronous  angles) 
and  Nb  =  5  (five  beams  are  formed).  The  main  lobes  of 
the  unshaded  beam-patterns  corresponding  to  the  resulting 
synchronous  beam-pointing  directions  are  depicted  in  Fig¬ 
ure  6  for  0  in  between  0°  and  90".  For  negative  angles,  the 
figure  is  symmetrical.  The  figure  shows  that  there  indeed 
is  a  need  for  a  high  M  to  exploit  the  best  possible  angular 
discrimination.  However,  when  M  is  chosen  much  larger, 
the  angular  discrimination  no  longer  improves,  as  the  suc¬ 
cessive  beams  merely  overlap.  In  practice,  M  will  be  in 
between  15  and  40  for  a  7  sensor  array  which  is  sampled 
at  the  Nyquist  rate,  and  the  proposed  method  outperforms 
the  alternatives  discussed.  The  complexity  for  the  proposed 
beamformer  equals  wNNBfs  =  350/,  multiplications  per 
second  for  this  example.  The  alternate  schemes  of  Figure  3 
and  4  require  =  700  and  =  500/,  multiplica¬ 
tions  per  second  respectively.  A  significant  efficiency  gain 
is  thus  obtained. 

7.  Conclusions  and  Future  Research 

A  new  method  using  polyphase  decomposition  was  pro¬ 
posed  for  reduced  complexity  interpolation  delay  and  sum 
beamforming.  Significant  computational  savings  are  re¬ 
ported  for  beamformers  with  a  high  angular  discrimination. 

In  future  research  the  polyphase  equivalent  scheme  will 
be  used  to  study  relations  between  interpolation  beamform¬ 
ing  and  other  broadband  array  processing  techniques.  The 
use  of  an  adaptive  algorithms  to  track  moving  sources  us- 
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ing  delay  and  sum  beamformers  will  also  be  considered  in 
future  research. 
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Figure  3.  Interpolation  beamformer 


Figure  4.  Beamformer  equivalent 


(d) 


Figure  5.  Interpolation  filter  (a),  polyphase  de¬ 
composed  filter  (b),  polyphase  equivalent  (c) 
and  compacted  polyphase  equivalent  (d). 


0  10  20  30  50  60  70  aO  90 


Figure  6.  Beampointing  patterns  for  ail  syn¬ 
chronous  directions,  M  =  20  and  N  =  7 
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Abstract 

This  paper  describes  the  design  requirements  for 
a  QAM  demodulator  chip  recently  developed  to  he 
part  of  a  settop  convertor  for  digital  cable  television. 
The  chip  demodulates  64-  and  256- QAM  signals  at 
a  maximum  bit  rate  of  44  Mb/s  and  uses  blind  acqui¬ 
sition  techniques  so  that  no  training  or  pilot  signals 
need  be  sent  by  the  transmitter. 


1  Introduction 


The  desire  to  send  many  bits  of  data  per  Hertz 
of  transmission  bandwidth  has  caused  the  develop>- 
ment  of  sophisticated  communications  systems  us¬ 
ing  quadrature  amplitude  modulation  (QAM).  First 
introduced  for  voiceband  modems  [1]  the  technol¬ 
ogy  was  then  applied  to  microwave  radio  relay  sys¬ 
tems  [2],  Its  success  in  those  applications  has  led 
to  great  interest  in  its  use  for  other  communication 
situations  in  which  economic  or  regulatory  consid¬ 
erations  limit  the  available  transmission  bandwidth. 
An  important  example  of  such  an  application  is  the 
wireless  and  cable  distribution  of  digital  television 
[3],  This  paper  describes  how  digital  transmission 
is  used  for  cable  television  distribution  and  how  the 
characteristics  of  a  cable  system  affect  the  design  of 
a  suitable  demodulator. 


®CRJ’s  research  on  blind  equalization  is  currently  sup¬ 
ported  in  part  by  NSF  Grant  MIP-9509011  and  Applied  Sig¬ 
nal  Technology. 
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2  Background 


Figure  1  shows  the  block  diagram  of  a  digital 
communications  system.  The  input  data  is  applied 
to  the  modulator  and  transmitter,  which  convert 
the  data  stream  into  a  bandlimited  analog  wave¬ 
form  and  frequency-translate  it  into  the  frequency 
band  appropriate  for  transmission.  As  the  signal 
propagates  to  the  receiver  it  is  delayed,  attenuated, 
and  sometimes  distorted  in  a  frequency-selective 
manner.  These  effects,  on  which  we  will  elabo¬ 
rate  shortly,  are  modeled  in  the  block  diagram  as 
the  propagation  channel  The  receiver  accepts  the 
channel  output,  plus  noise  and  interference  inadver¬ 
tantly  present  at  the  receiver  input,  and  attempts 
to  recover  the  input  data  sequence. 

In  the  particular  case  of  digital  cable  television 
transmission,  Figure  1  gives  way  to  the  system 
shown  in  Figure  2.  Compressed  video,  audio,  tele¬ 
phony,  and  even  other  data  services  are  combined 
into  a  composite  data  stream  and  modulated  onto 
a  carrier  wave.  Many  of  these,  plus,  possibly,  older 
analog  television  signals,  are  summed  together  for 
transmission  and  distribution.  Older  systems  do  the 
transmission  on  coaxial  cables  only.  Newer  systems 
use  both  fiber  optic  transmission  and  coaxial  cable 
(as  shown  in  Figure  2),  while  the  newest  promise  to 
send  the  signals  directly  to  the  customer  premises 
on  fiber.  Once  in  the  customer  premises  the  signal 
is  commonly  split  and  distributed  to  many  devices, 
including  VCRs,  television  sets,  and,  in  the  future, 
cable  modems. 

More  detail  on  the  “headend”  is  shown  in  Fig¬ 
ure  3.  The  video  and  audio  for  a  particular  tele- 
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vision  source  sire  di^tized  and  compressed.  The 
resulting  output  data  rate  depends  on  the  type  of 
compression  used  and  the  desired  fidelity  of  the  re¬ 
ceived  image  and  sound.  Quality  comparable  to 
high-SNR  NTSC  transmission  can  be  obtained  ufr 
ing  MPEG-2  compression,  yielding  an  average  bit 
rate  of  about  6  Mb/s.  High  definition  television 
(HDTV),  with  its  larger  screen  size  and  greater  res¬ 
olution,  requires  about  25  Mb/s.  Since  the  modu¬ 
lation  anticipated  (more  on  this  shortly)  can  carry 
a  raw  data  rate  of  30  to  40  Mb/s,  this  permits  sev¬ 
eral  digitized  video/audio  sources  to  be  multiplexed 
together  on  a  single  “digital  signal”.  This  multi¬ 
plexing  can  be  done  deterministically,  that  is,  by 
giving  each  of  the  sources  a  fixed  bit  rate  alloca¬ 
tion,  or  it  can  be  done  dynamically,  allowing  the 
number  of  sources,  their  quality,  and  the  types  of 
sources  to  be  managed  by  the  headend.  For  ex¬ 
ample,  one  HDTV  and  two  normal  resolution  TV 
sources  might  share  the  bandwidth,  or,  conversely,  a 
sports  program  with  a  high  degree  of  motion  might 
be  allowed  to  use  some  of  the  bits  allocated  to  a 
video  signal  with  a  more  static  image. 

Once  the  30  to  40  Mb/s  stream  of  multiplexed 
and  possibly  encrypted  signals  is  put  together,  for¬ 
ward  error  correction  bits  are  added  and  the  com¬ 
posite  stream  is  modulated  onto  a  carrier.  These 
modulated  signals  have  a  bandwidth  less  than  6 
MHz  so  that  they  can  be  firequency-division  multi¬ 
plexed  (FDM)  onto  the  cable  transmission  medium. 
By  adhering  to  the  6  MHz  spacing,  compatibility  is 
maintained  with  older  analog  TV  transmission  sys¬ 
tems.  Note  also  that  the  30  to  40  Mb/s  carried  on 
the  modulated  signal  need  not  be  used  completely 
(or  even  partially)  for  television.  Such  a  “data  pipe 
is  usually  refered  to  as  a  “cable  modem” ,  and  can  be 
used  for  a  variety  of  services,  including  high-speed 
Internet  connectivity  from  a  server  to  a  computer 
at  the  home  or  office. 

In  comparing  Figure  1  with  Figure  2  we  see  that 
the  propagation  channel  includes  all  of  the  cable 
distribution  equipment  from  the  modulator  output 
to  the  demodulator  input  in  the  destination  device, 
and  therefore  includes  all  up-  and  downconverters, 
bandpass  filters,  combiners,  trunk  amplifiers,  coax¬ 
ial  cable  runs,  and  splitters.  How  these  devices 
and  equipment  disrupt  the  signal’s  transmission  can 
be  understood  after  a  discussion  of  the  method  by 
which  the  digital  data  is  prepared  for  transmission 
over  the  analog  medium. 

Modern  bandwidth-efficient  transmission  of  digi¬ 


tal  data  is  based  on  the  concept  of  sending  pulses  [4]. 
The  input  data  is  partitioned  into  sets  of  N  bits 
and  those  bits  are  then  used  to  determine  the  phase 
angle  and  peak  amplitude  of  the  pulse.  The  pulse 
shape  itself  is  chosen  to  ensure  a  bandlimited  signal 
sp»ectrum.  The  receiver  is  designed  to  determine  the 
amplitude  and  phase  of  each  incoming  pulse,  deter¬ 
mine  which  of  the  2^  possibilities  has  been  sent, 
and  then  report  out  the  corresponding  N  bits.  If 
the  pulses  are  transmitted  at  the  symbol  or  baud 
rate  of  fs  symbol  per  second,  then  the  transmiss- 
sion  system  can  carry  N  •  /b  bits  per  second. 

The  effect  of  the  cable  transmission  plant  is  to 
disperse  the  transmitted  pulses  in  time.  Its  effect 
on  a  QAM  signal  is  often  assessed  by  looking  at  the 
signal’s  constellation.  This  is  an  overlay  of  many 
received  symbol  measurements.  In  the  absence  of 
noise,  interference,  and  disi>ersion,  and  with  perfect 
estimation  of  the  signal’s  amplitude,  carrier,  and 
timing,  the  received  measurements  from  a  64-QAM 
signal  should  look  as  they  do  in  Figure  4(a).  The 
presence  of  dispersion  alone  is  sufficient  to  produce 
the  degradation  seen  in  Figure  4(b).  In  the  ab¬ 
sence  of  additive  noise  and  receiver  imperfections, 
the  displacement  between  an  actual  received  con¬ 
stellation  point  and  the  transmitted  point  shown  in 
Figure  4(a)  is  a  combination  of  the  channel  disper¬ 
sion’s  effect  on  the  particular  pulse  being  consid¬ 
ered  and  the  intersymbol  interference  (ISI)  induced 
by  the  channel  on  the  adjacent  pulses.  Some  of  the 
received  symbols  are  displaced  sufficiently  that  the 
nearest  neighbor  decision  rule  makes  errors.  Be¬ 
cause  of  the  potential  for  frequent  errors  from  this 
source,  the  demodulator  requires  an  adaptive  equal¬ 
izer  of  some  type  to  compensate  for  the  effects  of  the 
cable  plant’s  dispersion. 

In  further  comparing  Figure  1  with  Figure  2  we 
see  that  the  noise  and  interference  includes  all  noise 
introduced  by  the  active  components  of  the  sys¬ 
tem  plus  the  interference  produced  inside  the  sys¬ 
tem  and  received  from  outside  it.  The  noise  is  usu¬ 
ally  controlled  by  careful  design  and  maintenance  of 
the  system.  The  interference,  usually  refered  to  as 
ingress,  is  combatted  by  minimizing  any  intermodu¬ 
lation  distortion  within  the  system  and  by  ensuring 
good  maintenance  of  the  system  to  prevent  strong 
externally  generated  signals,  such  as  from  radio  or 
broadcast  television,  from  entering  the  distribution 
system. 
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3  The  Demodulator’s  Requirements 


In  light  of  this  background,  the  requirements  for 
the  demodulator  can  be  enumerated: 


•  The  modulator/demodulator  pair  must  reli¬ 
ably  carry  as  much  data  as  possible  over  cur¬ 
rently  available  cable  systems.  The  use  of 
QAM  on  a  6  MHz  channel  limits  the  Baud 
rate  to  about  5  MHz.  The  noise  floor  present 
in  a  well-engineered  conventional  cable  system 
limits  the  QAM  constellation  size  to  about 
256.  For  poorer  systems,  a  constellation  size 
of  64  might  be  used  to  add  some  “system  mar¬ 
gin”  at  the  expense  of  25%  of  the  available 
transmission  rate.  Thus,  the  demodulator  is 
required  to  handle  up  to  256  QAM  with  a 
maximum  baud  rate  of  5.5  MHz  or  so. 


•  The  demodulator  must  operate  in  a  non- 
cooperative  manner,  that  is,  it  should  not 
need  any  special  training  or  synchronization 
from  the  transmitting  modulator.  Further, 
the  user  should  be  able  to  change  channels 
rapidly  ( “channel  surf’ )  without  subtantial  re- 
acquisition  delays  being  introduced  by  the  de¬ 
modulator. 


•  The  demodulator  needs  to  be  cheap  and  to 
operate  with  other  cheap  components. 


•  Finally,  the  demodulator  must  handle  the  sig¬ 
nal  impairments  to  which  it  may  be  subjected. 
These  include  both  the  interference  (inter¬ 
modulation  distortion  and  “ingress”)  and  the 
signal  dispersion  introduced  by  the  distribu¬ 
tion  plant  itself.  Conventional  demodulator 
design  cannot  inexpensively  deal  with  large 
amounts  of  interference  and  therefore  these 
are  traditionally  handled  by  vigilant  system 
maintenance.  Channel  dispersion,  however,  is 
a  fundamental  characteristic  of  a  distribution 
plant  and  the  demodulator  must  compensate 
for  it.  In  order  to  reach  a  suitable  specifica¬ 
tion  for  it,  however,  we  must  first  determine 
the  degree  of  dispersion  present  in  cable  TV 
systems. 


4  Characterizing  The  Cable  Televl 
sion  Propagation  Channel 


While  modeling  of  the  cable  propagation  channel 
can  and  has  been  done  anal3diically,  the  approach 
taken  here  is  to  measure  it  in  real  cable  television 
systems.  We  first  describe  the  method  employed 
and  then  the  results. 

In  practical  circumstances  the  propagation  char¬ 
acteristics  of  the  channel  between  a  transmitter  and 
receiver  are  not  known  a  priori.  Rirther,  a  one-time 
calibration  of  a  channel’s  characteristics  is  not  use¬ 
ful  since  channels  are  known  to  vary  with  time  ow¬ 
ing  to  influences  from  environmental  and  manmade 
factors.  To  deal  with  this  time  variation  it  is  use¬ 
ful  to  have  channel  modeling  techniques  which  can 
use  “signals  of  opportunity”  to  probe  the  channel  to 
be  analyzed.  The  method  used  here,  first  described 
by  Gk)och  and  Harp  [6],  uses  a  demodulator  to  ob¬ 
tain  s3mibol  estimates  from  a  PSK  or  QAM  signal 
of  opportunity  and  then  uses  these  symbols  along 
with  the  received  signal  itself  as  inputs  to  a  chan¬ 
nel  modeller.  This  scheme  is  shown  in  Figure  5. 
The  key  to  this  technique’s  success  is  the  use  of  a 
blind  equalizer  in  the  demodulator  to  “op>en  the  eye” 
enough  for  the  demodulator  to  initially  acquire  the 
signal.  Once  acquisition  has  occured,  the  demodu¬ 
lator  begins  to  use  its  own  symbol  decisions  as  the 
desired  input  to  an  LMS-directed  equalizer  update 
algorithm.  (See  Wolff,  IVeichler,  and  Gooch  [10]  for 
an  early  description  of  such  a  demodulator.)  These 
symbol  decisions  are,  of  course,  the  same  regener¬ 
ated  symbols  needed  as  one  of  the  inputs  to  the 
modeling  stage. 

Gooch  and  Harp  [6]  used  a  LMS-directed  FIR 
adaptive  filter  to  estimate  the  pulse  response  of  the 
propagation  channel.  The  filter’s  input  is  the  stream 
of  regenerated  symbols,  interpolated  with  alternate 
zeros  to  create  a  fractionally-sampled  input  rate  of 
2/b,  where  /b  is  the  symbol  or  Baud  rate  of  the 
received  signal.  The  reference  or  desired  input  to 
the  adaptive  modeller  is  a  version  of  the  input  sig¬ 
nal  delayed  to  compensate  for  the  processing  delay 
of  the  demodulator.  The  LMS  algorithm  is  used 
to  adapt  the  coefficients  of  the  complex- valued  fil¬ 
ter  pulse  response.  The  convergent  solution  is  well 
known  to  closely  approximate  the  least-squares  fit 
between  the  actual  channel  and  the  model.  The 
error  signal  e{k)  contains  unmodeled  components, 
misadjustment  noise,  and  receiver  noise.  In  pass¬ 
ing  it  should  be  noted  that  this  error  signal  can  be 
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spectrum  analyzed  to  reveal  the  presence  and  char¬ 
acteristics  of  additive  signal  impairments  such  as 
cochannel  interference  [6].  Use  of  this  approach  to 
identify  ingress  into  a  cable  system  will  be  discussed 
in  Section  6. 

An  example  of  the  result  of  this  modeling  proce¬ 
dure  is  shown  in  Figure  6.  The  power  spectrum 
of  a  64rQAM,  5.1  MBaud  signal  appears  in  Fig¬ 
ure  6(a).  Adjacent  to  it  is  the  power  transfer  fimc- 
tion  of  the  estimated  channel.  This  was  obtained 
by  first  developing  an  FIR  model  of  the  channel 
pulse  response,  as  described  above,  and  then  com¬ 
puting  the  log  magnitude  square  of  the  FFT  of  that 
complex-valued  pulse  response  shown  in  Figure  7. 
Note  the  close  correspondence  of  the  channel  shap¬ 
ing  between  the  received  spectrum  and  that  of  the 
model. 

By  inspecting  the  log  magnitude  of  the  estimated 
pulse  response  in  Figure  7,  we  can  see  that  the  dian- 
nel  does  not  conform  to  a  simple  two-  or  three-ra.y 
specular  model  but  in  fact  the  received  signal  is 
the  combination  of  many  delayed  and  scaled  ver¬ 
sions  of  the  transmitted  symbol  stream.  A  more 
detailed  examination  of  many  such  channel  esti¬ 
mates  indicates  that  the  dispersion  can  be  broken 
into  two  classes,  “microreflections”  and  “macrore¬ 
flections”.  The  macroreflections  have  large  ampli¬ 
tude  compared  with  the  transmitted  signal  and  have 
relative  delays  on  the  order  of  microseconds,  indi¬ 
cating  a  strong  reflection  from  the  end  of  a  long  im¬ 
properly  terminated  stub.  (Recall  that  the  round- 
trip  delay  on  a  coaxial  cable  is  about  18  microsec¬ 
onds/mile.)  The  microreflections  are  multitudinous 
but  small  in  amplitude,  stemming  from  a  large  num¬ 
ber  of  lower  level  reflections  on  short  cable  sections 
within  the  system.  The  macroreflections  must  be 
found  by  the  maintence  crews  and  removed,  since 
building  the  demodulator  to  compensate  for  them 
is  uneconomical.  The  microreflections,  however,  are 
a  fact  of  life  even  in  a  well-designed,  well-maintained 
system  and  must  be  accommodated  by  the  demod¬ 
ulator.  Examination  of  a  large  number  of  the  chan¬ 
nel  models  of  the  type  seen  in  Figure  7  shows  that 
a  reasonable  estimate  of  the  maximum  delays  seen 
for  a  cable  system’s  microreflections  is  2  to  3  mi¬ 
croseconds.  A  database  (to  be  resident  in  the  Na¬ 
tional  Science  Foundation’s  Signal  Processing  Infor¬ 
mation  Base  (SPIB)  at  Rice  University  and  linked  to 
http:  / /www.ee.cornell.edu/  faculty/R  Johnson.html) 
includes  a  representative  sample  of  the  received  sig¬ 
nals  used  to  draw  these  conclusions. 


Given  this  estimate  for  the  maximum  delay 
spread  of  3  microseconds  for  the  cable  propagation 
channel,  how  long  does  the  demodulator’s  equalizer 
need  to  be?  This  question  has  been  recently  ad¬ 
dressed  in  [9],  which  discusses  the  recent  techmcal 
result  that  a  fractionally-spaced  equalizer  need  be 
no  longer  than  the  maximum  expected  delay  spread 
of  the  channel.  In  light  of  this  result  the  length  of 
the  fractionally  spaced  equalizer  should  be  at  least 
16  symbols  long  (so  the  data  rate  of  5.1  Mbaud  < 
16  symbols  /  3  microseconds). 


5  The  Demodulator  Design 


Many  different  approaches  have  been  used  to  de¬ 
sign  a  demodulator  for  digital  signals.  An  indica¬ 
tion  of  the  choices  available  in  this  design  process 
are  shown  in  Figure  8.  In  general  the  demodula¬ 
tor  must  (1)  bandpass  Alter  the  incoming  signal, 
(2)  adjust  the  input  signal  amplitude,  (3)  estimate 
and  remove  any  carrier  component,  (4)  equalize  the 
channel’s  dispersive  effects,  (5)  “slice”  the  input  sig¬ 
nal  to  obtain  pulse  amplitude  and  phase  measure¬ 
ments,  (6)  decide  which  pulse  amplitude  and  phase 
was  actually  transmitted,  and  (7)  convert  that  deci¬ 
sion  into  the  associated  bit  pattern.  Demodulators 
for  digital  cable  transmission  incorporate  forward 
error  correction  as  well. 

Even  though  it  is  only  one  component  of  the  de¬ 
modulator,  the  adaptive  equalizer’s  design  takes  on 
special  importance  for  three  reasons:  (1)  its  perfor¬ 
mance  is  crucial  to  the  goal  of  maximizing  the  trans¬ 
mission  rate  through  the  dispersive  channel,  (2)  it  is 
the  most  complicated  of  all  the  demodulator’s  com¬ 
ponents,  and  (3)  it  consumes  a  large  fraction  of  the 
computation  ne^ed  to  implement  the  complete  de¬ 
modulator.  Amplifying  on  this  third  point,  it  is 
not  unusual  for  the  adaptive  equalizer  to  consume 
more  than  80%  of  the  multiply/add  cycles  needed 
to  demodulate  a  256-QAM  signal.  Given  that,  it 
becomes  an  important  design  consideration  to  limit 
the  length  of  the  equalizer  to  only  that  required  to 
handle  adequately  the  range  of  propagation  chan¬ 
nels  expected  to  be  encountered. 

With  an  eye  toward  minimizing  and  simplying 
the  computation  needs,  early  demodulators  used  so- 
called  T -spaced  equalizers  [4].  After  filtering,  gain 
control,  and  carrier  removal,  the  input  signal  was 
sampled  once  per  symbol  (pulse).  The  timing  of 
this  sampling  clock  was  adjusted  so  that  the  sam- 
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pies  were  taken  at  the  “top  dead  center”  of  the 
received  pulses.  These  samples  then  entered  the 
equalizer’s  tapped  delay  line  filter.  A  linear  com¬ 
bination  of  them  was  fed  on  to  the  measurement 
and  comparison  stages.  Error  measurements  made 
in  the  decision  circuit  were  fed  back  to  the  equal¬ 
izer’s  adaptation  algorithm  to  optimize  the  choice 
of  filter  weighting  coefficients. 

WhUe  theoretically  reasonable  (i.e.,  “one  sample 
per  pulse”)  and  computationally  desirable,  practi¬ 
cal  design  of  high  speed  modems  has  gravitated 
away  from  T-spaced  equalizers  and  toward  the  use 
of  yrocfrona//j/-5paced  equalizers  (FSE),  so  called  be¬ 
cause  the  equalizer  taps  are  closer  together  in  time 
than  the  symbol  interval  T.  Equivalently,  and  per¬ 
haps  more  intuitively,  this  means  that  the  input  to 
the  equalizer  is  sampled  faster  than  the  symbol  rate 
fs.  The  output  rate  is  still  at  the  symbol  rate, 
making  the  FSE  a  decimating  or  even  resampling 
filter. 

If  the  temporal  spread  of  the  equalizers  are  held 
to  the  same  value,  then  the  FSE  obviously  consumes 
more  computation  than  a  T-spaced  design.  Why 
then  use  them?  The  answer  is  that  even  though 
they  require  more  computation,  they  simplify  the 
rest  of  the  demodulator’s  design  and  allow  it  to 
work  at  virtually  theoretical  levels.  The  principal 
reason  for  this  is  that  even  though  the  pulses  arrive 
at  rate  /b,  the  actual  bandwidth  of  the  signal  is 
somewhat  larger,  typically  10  to  40%  higher.  As  a 
result,  sampling  the  conditioned  input  at  the  rate 
of  /b  Hz  is  not  enough  to  satisfy  the  Nyquist  the¬ 
orem.  While  not  important  if  all  parameters  of  the 
signal  were  known,  the  fact  that  the  input  signal 
must  be  processed  to  extract  timing  and  carrier  in¬ 
formation  means  that  sampling  at  fs  is  not  fast 
enough.  There  are  also  some  curious  signal  cancel¬ 
lation  effects  that  arise  when  the  signal  components 
alias  into  a  band  of  only  fs  Hertz. 

The  actual  input  rate  to  the  FSE  is  usually  gov¬ 
erned  by  a  variety  of  hardware  considerations.  The 
rate  must  be  high  enough  to  satisfy  the  sampling 
theorem,  but  lowering  the  rate  reduces  the  compu¬ 
tational  requirements.  The  most  common  choice 
is  to  sample  the  conditioned  input  signal  at  ex¬ 
actly  twice  the  symbol  rate  /b,  making  the  filter 
tap  spacing  equal  to  y,  half  of  the  symbol  spacing. 
The  resulting  equalizer  thus  decimates  its  input  by 
a  factor  of  2,  producing  one  output  for  every  two 
input  samples.  It  is  not  uncommon  to  operate  at  a 
fractional  rate  either.  The  demodulator  chip  to  be 


described  shortly  uses  a  ^  design,  in  which  a  dig¬ 
ital  timing  recovery  circuit  and  resampler  supplies 
complex- valued  samples  into  the  equalizer  at  a  rate 
only  20%  higher  than  the  symbol  rate. 

In  response  to  these  requirements  described  in 
Sections  3  and  4,  the  QAM  demodulator  chip  pic¬ 
tured  in  [9]  was  developed.  The  demodulator  chip 
fits  into  a  settop  convertor  design  of  the  general  type 
shown  in  Figure  9.  A  conventional  TV  tuner  is  used 
to  extract  the  selected  RF  channel  and  translate 
it  to  the  standard  45  MHz  IF.  This  analog  signal 
is  then  bandpass  sampled  and  the  resulting  8-bit 
samples  £ire  applied  to  the  demodulator  chip.  The 
chip  first  measures  the  power  of  the  input  and  feeds 
back  a  control  signal  to  the  amplifier  which  pro¬ 
ceeds  the  A/D.  This  loop  constitutes  an  automatic 
gain  control  (AGC).  The  signal  is  then  quadrature 
downconverted  to  produce  a  complex- valued  sam¬ 
ple  stream.  The  image  rejection  filtering  is  per¬ 
formed  asynchronously  to  the  input  clock  in  such 
a  way  that  the  filter  output  rate  is  synchronous  to 
the  QAM  symbol  rate.  This  “asynchronous  resam¬ 
pling”  is  controlled  by  a  circuit  which  extracts  a 
tone  at  the  symbol  rate  and  feeds  information  back 
to  the  filter.  The  resulting  rate-synchronous  sam¬ 
ple  stream  is  applied  to  a  fractionally  spaced  adap¬ 
tive  equalizer.  Its  output,  decimated  to  exactly  one 
complex  sample  per  symbol,  is  applied  to  the  digital 
carrier  tracking  loop,  which  removes  residual  carrier 
frequency  and  phase,  produces  “soft  decisions” ,  and 
quantizes  the  soft  decisions  to  produce  8-bit  symbol 
outputs.  The  initial  prototype  chip  consumes  about 
3  watts  and  executes  the  equivalent  of  700  million 
multiplications  per  second. 

In  order  to  let  the  viewer  select  any  TV  channel 
at  will,  the  demodulator  must  be  able  to  acquire  all 
of  its  tracking  parameters,  including  the  equalizer, 
without  aid  from  the  transmitter.  To  do  this  the 
adaptive  equalizer  uses  the  Constant  Modulus  Algo¬ 
rithm  [7,  8]  to  initially  “open  the  eye”  and  then  au¬ 
tomatically  switches  over  to  decision  direction  once 
carrier  phase  acquisition  is  complete.  Decision  feed¬ 
back  is  not  employed  owing  to  the  pipelined  nature 
of  the  chip’s  VLSI  design. 


6  Additional  Uses  of  the  QAM  De¬ 
modulator  Chip 


While  developed  originally  for  use  in  digital  ca¬ 
ble  settop  converters,  the  demodulator  chip  will 
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be  useful  for  at  least  three  other  applications  as 
well.  The  first  and  second  are  for  demodulation 
of  digital  TV  signals  which  are  broadcast  over  ra¬ 
dio  frequency  (RF)  diannels  instead  of  being  sent 
through  a  coaxial  or  fiber  cable  medium.  High  def¬ 
inition  television  is  to  be  transmitted  in  the  US 
over  the  same  6  MHz-wide  VHP  and  UHF  chan¬ 
nels  over  which  analog  television  is  now  sent.  Al¬ 
though  vestigal  sideband  (VSB)  transmission  is  cur¬ 
rently  planned,  the  ubiquity  of  QAM  will  proba¬ 
bly  win  out.  Once  it  does,  the  QAM  demodula¬ 
tor  chip  can  be  used  directly.  The  other  broad¬ 
cast  medium  is  Multipoint  Microwave  Distribution 
Systems  (MMDS),  also  called  “wireless  cable”,  in 
which  analog  and  digital  television  signals  of  the 
same  structure  as  used  for  cable  transmission  are 
sent  instead  over  a  broadcast  signal  in  the  2.5  GHz 
microwave  band.  Both  of  these  scenarios  have  sub¬ 
stantially  different  propagation  characteristics  than 
cable-transmitted  signals  usually  do,  implying  that 
the  adaptive  equalization  used  in  the  demodulator 
must  be  robust  and  that  the  equalizer’s  length  must 
match  the  delay  spreads  of  2  to  3  microseconds  often 
seen  in  the  broadcast  environment. 

The  third  application  of  the  demodulator  is  in 
test  equipment  used  for  maintaining  the  cable  sys¬ 
tem  itself.  By  using  the  demodulator  chip  as  a  part 
of  the  block  diagram  shown  in  Figure  5  it  is  possible 
to  build  a  handheld  piece  of  equipment  capable  of 
noninvasively  testing  cable  signals  and  characteriz¬ 
ing  any  problems  encountered.  Such  a  piece  of  test 
equipment  is  shown  in  Figure  10.  It  can  tune  to  any 
RF  channel,  measure  the  signal  quality,  and  test  for 
the  presence  of  macrorefiections  and  ingress.  As  an 
example,  consider  Figure  11,  which  shows  not  only 
the  signal  constellation  and  spectrum,  but  also  the 
diannel  model  and  ingress  spectrum  for  an  actual 
cable  TV  signal.  The  plots  indicate  that  the  quality 
degradation  encountered  stems  not  from  a  macrore- 
fiection  but  in  fact  from  ingress  from  a  local  FM 
radio  station. 


7  Conclusion 


This  paper  has  described  a  recently  developed 
blind  QAM  demodulator  chip  designed  to  be  part  of 
a  settop  converter  for  digital  cable  television.  The 
design  considerations  effecting  the  blind  equalizer 
component,  such  as  length  and  update  algorithm, 
have  been  stressed.  Field  operating  data  and  some 
data-based  channel  models  for  QAM  transmission 


across  cable  are  being  made  available  (as  described 
in  [9])  as  a  stimulant  to  further  research  on  blind 
equalization  useful  in  this  application. 
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Figure  2.  Block  Diagram  of  a  Modem  Tele* 
vision  Cable  Distribution  System 


Figure  3.  Functional  Block  Diagram  of  the 
Digital  Multiplexer  and  Modulator  for  a  Ca> 
ble  Television  “Headend” 


Figure  4.  The  Impact  of  a  Simple  Linear 
Channel  on  a  64-QAM  Signal 
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Rgure  5.  A  Technique  for  Developing  FiR 
Modeis  for  Propagation  Channeis  Carrying 
PSK  or  QAM  Signais 
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Hgure  6.  An  Modeiing  Example:  (a)  Re¬ 
ceived  power  spectrum  of  a  64-QAM,  5.1 
MBaud  Cable  Modem  Signal,  (b)  The  power 
transfer  function  of  the  estimated  propaga¬ 
tion  channel 


Hgure  7.  The  Channel  Estimate  Impulse 
Response  Magnitude  for  the  64-QAM  Slg- 


Hgure  8.  The  Block  Diagram  of  a  Generic 
Equalized  Data  Demodulator 
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Figure  11.  Using  the  Gooch-Harp  Tech¬ 
nique  to  Reveal  Ingress  in  an  Actual  Digital 
Cable  TV  System 


MP-1:  Signal  Processing  for 
Communications  I 


AN  OUTER-PRODUCT  DECOMPOSITION  ALGORITHM  FOR 
MULTICHANNEL  BLIND  IDENTIFICATION 


Zhi  Ding 

Department  of  Electrical  Engineering 
Auburn  University 
Auburn,  AL  36839-5201 
Email:  dingOeng,  auburn,  edu 


Abstract 

Blind  channel  identification  and  equalization  have 
attracted  a  great  deal  of  attention  recently  due  to  their 
potential  application  in  mobile  communications  and 
digital  HDTV  systems.  In  this  paper,  we  present  a  new 
algorithm  based  on  channel  parameter  outer-product 
decomposition.  This  new  algorithm  can  be  viewed  as 
a  generalization  of  a  recently  proposed  linear  prediction 
algorithm.  It  produces  more  accurate  channel  estimates 
and  is  more  robust  to  over-modeling  errors  in  channel 
order  estimate. 

1  Introduction 

In  popular  data  communication  systems  such  as  the 
digital  mobile  systems  and  digital  HDTV  systems,  data 
signals  are  often  transmitted  through  unknown  chan¬ 
nels  which  may  introduce  severe  linear  distortion.  In 
order  to  improve  the  system  performance,  it  is  im¬ 
portant  for  the  receiver  to  remove  channel  distortions 
through  equalization  or  sequence  estimation.  Because 
the  available  channel  input  training  sequence  may  be 
too  short  or  even  non-existent  for  channel  identifica¬ 
tion,  blind  channel  identification  can  play  useful  roles 
in  these  systems. 

Blind  channel  identification  relies  solely  on  the  re¬ 
ceived  channel  output  signal  and  some  a  priori  statis¬ 
tical  knowledge  of  the  original  channel  input  signal. 

A  linear  prediction  based  approach  was  first  pre¬ 
sented  by  by  Slock  [5]  and  wets  later  generalized  and 
refined  by  Meriam  et  al  [6].  Unlike  many  of  the  sub¬ 
space  methods  that  tend  to  be  very  unreliable  when 
the  channel  order  is  over-estimated,  the  linear  predic¬ 
tion  approach  is  found  to  be  rather  robust.  However, 
as  will  become  clear  in  this  paper,  the  linear  prediction 
algorithm  (LPA)  does  not  fully  exploit  all  the  available 
second  order  statistical  information  of  the  channel  out- 
put. 

In  order  to  derive  a  more  robust  algorithm  for  chan¬ 
nels  with  weak  precursor  impulse  responses,  the  focus 
of  this  paper  is  to  derive  the  estimate  based  on  the  full 
outer-product  decomposition  of  the  channel  parameter 
vector.  Our  results  will  show  that  bcused  on  a  com¬ 


plete  outer-product  decomposition,  channel  identifica¬ 
tion  can  be  significantly  improved. 

2  Problem  Formulation 

A  multi-user  QAM  data  communication  system  can  be 
captured  by  a  baseband  representation.  If  the  N  user 
channels  are  all  linear  and  causal  with  impulse  response 
t*  =  1|  2, . . .  iV  },  the  received  output  signal  can 
be  written  as 

N  oo 

=  X)  Sk,ueAu, 

U=:l  h  —  —  00 

(2.1) 

where  T  is  the  symbol  baud  period  and  is  the  input 
signal  set  of  user  u.  The  noise  w{t)  is  stationary,  white, 
and  independent  of  channel  input  sequences  Sk  uj  but 
not  necessarily  Gaussian.  Note  that  }^{t)  is  the*  “com¬ 
posite”  channel  impulse  response  that  includes  trans¬ 
mitter  and  receiver  filters  as  well  as  the  physical  chan¬ 
nel  response. 

It  is  known  that  channel  identification  based  on  sec¬ 
ond  order  statistics  is  possible  only  for  oversampled 
channel  output.  Let  the  sampling  interval  be  A  =  T/p 
where  p  is  an  integer.  The  oversampled  discrete  signals 
and  responses  are 

Xi  =  ®(iA),  hu[i]  =  hu{iA)  and  Wi  =  w{iA). 

(2.2) 

Suppose  {/itt(t)}  has  finite  support  [0,  Th),  which  spans 
mo  +  1  integer  periods.  By  defining  the  following  no- 


tations 

x[k] 

®jfcp+p-i 

®(t-l)p  ®(i-l)p+l  ®*p-Afp+l] 

A 

[«t,l  «4,2  .  .  .  3k, N  ]; 

s[fc] 

A 

[S*  Sk-1  ... 

w[fc] 

[t"tp  Wkp+l  ...  Wkp-Mp+l]' 

K[i\ 

[  /i«[ip]  /iu[ip+l]  ...  /iu[ip  +  p- 1]  ]', 

Hi 

A 

[hi[i]h2[i]  ...  h^[i]], 

0-8 186-7576-4/96  $5.00  ©  1996  IEEE 
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wc  can  form  a  Mp  x  (irio  +  M)  block  Tocplit*  matrix 

Ho  Hi  . . .  Hmo  ®  .  0 

0  Ho  Hi  . . .  Hmo 

H  — 

•  •  •  •  •  -  •  •  « 

:  •.  •.  •.  •  •  0 

0  ...  0  Ho  Hi  . . .  Hmo  - 

(2.3) 

Clearly,  mo  is  the  order  of  the  N  dynamic  FIR  chan¬ 
nels.  With  these  notations,  the  sampled  channel  out¬ 
put  signal  vector  can  be  written  as 


3  Algorithm  Development 

Wc  will  form  an  outer-product  of  the  channel  parame¬ 
ter  matrix 


h=[Hi  H'l  ...  (3.1) 

Let 

fTIo 

jr[i]  =  [*fcp  >fep+i  •••  ®tp+p-i] 

»=0 

(3.2) 

For  notational  convenience,  define 


x[jb]  =  H8[*]  +  w[fc].  (2.4) 

Consequently,  the  channel  output  covariance  matrix 
becomes 

=  f;{x[ib]x[jb]"}  =  o?HH"  +  all  (2.5) 

assuming  that  the  channel  input  signal  is  white  with 
zero  mean  and  Rs  =  -B{s[ib]s[Jb]"}  =  crjj  while  the 
noise  is  spatially  white  with  zero  mean  with  Rw  = 
^{w[fc]w[fc]^}  =  cr*  7. 

Our  objective  is  to  identify  the  channel  H  from  the 
second  order  statistics  of  the  channel  output  signal  x[*] 
given  in  Rmo  under  the  identifiability  condition  [1]  that 
both  H  and  R»  arc  full-rank.  The  use  of  second  or¬ 
der  statistics  for  blind  channel  identification  was  first 
exploited  by  Tong,  Xu,  and  Kailath  [1].  The  basic 
concept  hinges  on  the  signal  and  noise  subspace  sepa¬ 
ration  through  singular  value  decomposition  (SVD)  of 
the  auto-covariance  matrix  Rmo* 

The  sub-channel  matching  (SCM)  method  presented 
in  [3]  and  the  subspace  method  of  [2]  can  both  be  posed 
as  a  minimum  eigen- vector  problem  under  proper  chan¬ 
nel  length  constraints.  The  special  block  Toeplitz 
structure  is  utilized  in  both  algorithms.  When  the 
channel  length  is  over-estimated,  common  zeros  must 
be  factorized  out  from  the  sub-channel  estimates.  As 
a  result,  both  algorithms  are  very  sensitive  to  channel 
length  mis-matching. 

In  [5]  and  [6],  a  linear  prediction  algorithm  (LPA) 
was  presented  for  channel  estimation.  It  is  shown  to  be 
more  robust  to  over-estimated  channel  length.  How¬ 
ever,  as  we  will  show  later  in  this  paper,  the  LPA  only 
uses  part  of  the  overall  information  because  the  chan¬ 
nel  estimate  is  based  on  the  first  p  columns  of  the  esti¬ 
mated  channel  parameter  vector  outer-product  matrix. 
As  a  more  robust  and  accurate  channel  estimation  algo¬ 
rithm,  the  outer-product  decomposition  algorithm  we 
propose  will  exploit  second  order  statistics  more  effec¬ 
tively. 


fl(n)  ^  E{X[k]X[k  -  nf}  =  a?  ^  HiH?  (3.3) 


The  channel  output  covariance  covariance  matrix  can 
be  written  as 

^  E{x[k]x^[k]}  =  ajHH"  +  all.  (3.4) 

Denote 


Ha  = 


Ho  Hi  •••  H„„  0- ••  0 

Hi  Ha  •••  0  0-  •  0 


H„o  0 


..  0  0  •••  0 
If  we  define  p  x  p  block  matrices  as 


(3.5) 


2  HtHf+,._i,  l<i,i<mo  +  l,  (3.6) 


t=t-i 

it  can  be  verified  that 


Dl,t  I?1,2 

■Da.i 


Ua.mo+l 


1  -Dmo+l,!  -Dmo+l.a  ' ' '  ^mo+l,mo+l  J 

(3.7) 

This  matrix  is  an  (mo  +  l)p  x  {mq  +  l)p  Hermitian 
matrix.  Now  define  a  new  matrix  as 


“'mo+l 


7?a,2 

7^3,2 


7^3,mo+l 


0 

0 


7^mo+l|2  *  *  *  7?mo+l,fno+l  ® 

0  ••  0  0 


(3.8) 


We  can  form  another  Hermitian  matrix  from 

Am.+i  =  Wf--Dmo+i  =  hh".  (3.9) 
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Clearly,  matrix  Amo+i  forms  the  outer-product  of  the 
channel  parameter  matrix  h.  Its  singular  value  decom¬ 
position  will  generate  hQ,  in  which  QiB&n  NxN  uni¬ 
tary  matrix.  This  ambiguity  is  intrinsic  to  the  multi¬ 
user  blind  identification  problem  and  cannot  be  re¬ 
solved  unless  additional  information  is  available.  If  a 
multi-channel  equalizer  is  built  according  to  the  esti¬ 
mate  hQ,  the  N  receiver  outputs  will  be  memoryless 
combinations  of  the  JV^  channel  inputs  and  will  need  to 
be  separated. 

In  order  to  estimate  the  Amo+i  matrix,  first  con¬ 
struct 


Rc  = 


R{Q)-all  R{1)  ...  R{m^) 

Ril)  Ri2)  ...  0 


=  crJffaH". 
(3.10) 


[  JR(r7io)  0  • . .  0 

In  addition,  it  can  also  be  easily  shown  that 

Rmo  -  crll  =  a^.nK" .  (3.11) 


In  order  to  estimate  the  product  HaHa,  it  is  important 
to  note  that  when  H  has  full  column  rank,  is 

invertible  and  H"(HH®^)-^H  =  I.  Then 

Rc{Rmo  -  ^liy^Rc  =  (3.12) 

where  crj  is  known. 

If  there  is  only  a  single  user,  the  channel  impulse 
response  vector  can  be  estimated  from  the  rank  one 
outer-product  matrix,  through  eigen-decomposition, 
QR  decomposition,  or  simply  post-multiplying  a  ran¬ 
dom  column  vector.  We  thus  name  the  method  “outer- 
product  decomposition  algorithm”  (OPDA). 

Notice  that  OPDA  requires  two  singular  value  (or 
eigenvalue)  decompositions  in  its  implementation.  Its 
complexity  is  therefore  similar  to  the  linear  predic¬ 
tion  algorithm  (LPA)  presented  by  Meriam  ei  al  [6], 
the  TXK  method  [1],  and  the  sub-channel  matching 
method  [3].  However,  LPA  estimates  the  channel  only 
from  the  first  p  columns  of  the  outer-product  matrix.  If 
the  channel  impulse  response  has  weak  precursor  sam¬ 
ples,  then  LPA  is  likely  to  be  highly  inaccurate  since 
noise  and  numerical  error  will  likely  dominate  the  first 
few  columns  of  ADmo-^i-  Therefore,  OPDA  is  expected 
to  provide  more  robust  performance  than  LPA. 

4  Simulation  Results 

We  now  present  simulation  results  to  illustrate  the 
channel  identification  performance  of  the  proposed 
OPDA.  Our  experiments  are  based  on  a  single  user 
with  a  multi-path  channel  model.  We  consider  a  raised- 
cosine  pulse  P(t)  limited  in  6T  with  roll-off  factor  0.10 


and  a  two  ray  multi-path  channel 

c{t)  =  6{t)  ~  0.7^(f  -  T/3). 

The  impulse  response 

h{t)  =  c{t)  *  h{t)  =  P{t)  -  0.7P(t  -  T/3) 

is  shown  in  Figure  4.  The  data  input  signal  is  i.i.d. 
BPSK  and  the  oversampling  factor  is  p  =  3.  In  all  our 
simulations,  M  is  chosen  to  be  twice  as  long  as  P(t). 


In  the  first  set  of  simulation  tests,  we  compare  the 
two  methods  OPDA  and  LPA  based  on  100  and  200 
bauds  of  channel  output  samples.  The  channel  or¬ 
der  is  unknown  and  is  estimated  using  the  MDL  cri¬ 
terion.  The  normalized  mean  square  error  (MSE)  of 
the  channel  estimate  under  different  channel  SNR  lev¬ 
els  is  shown  in  Figure  1. 


(a)  Data  length  »  100T  (b)  Data  length  -  200T 


Figure  1:  Normalized  MSE  of  channel  estimate  given 
different  SNR  levels. 

For  several  different  data  lengths,  the  resulting  nor¬ 
malized  MSE  is  shown  in  Figure  2.  Once  again,  the 
results  show  that  OPDA  and  LPA  are  equally  ineffec¬ 
tive  when  SNR  is  low.  But  when  the  SNR  is  higher, 
OPDA  out-performs  LPA  significantly. 

We  also  tested  the  comparative  robustness  of  the  two 
algorithms  when  channel  mismatching  is  present.  Fix¬ 
ing  SNR=:20dB,  we  manually  varied  the  channel  length 
estimate  from  2T  to  lOT.  Notice  that  the  true  channel 
length  is  6T.  The  results  clearly  show  that  while  LPA 
is  less  sensitive  to  errors  in  channel  order  estimate,  its 
performance  is  generally  much  worse  compared  with 
that  of  OPDA.  When  the  channel  order  estimate  de¬ 
viates  modestly  from  the  true  channel  order,  OPDA 
generates  a  much  smaller  normalized  MSE. 
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SNR-IOdB  SNR-25  dB 


Figure  2:  Normalized  MSS  of  channel  estimate  given 
different  data  lengths. 


Figure  4:  50  independent  channel  estimates. 
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Figure  3:  Normalized  MSE  of  channel  estimate  given 
channel  length  mismatch. 

Finally,  we  compare  a  group  of  typical  impulse  re¬ 
sponses  estimated  from  50  independent  trials  of  the 
OPDA  and  LPA  under  20dB  SNR  and  data  length  of 
L  =  400T.  Assuming  the  channel  length  is  correctly 
estimated,  The  estimated  impulse  responses  are  shown 
in  Figure  4. 

5  Conclusions 

We  present  a  new  robust  and  accurate  blind  channel 
identification  algorithm  OPDA  based  on  outer-product 
decomposition.  This  new  algorithm  can  be  viewed  as 
a  generalized  method  of  the  recently  proposed  linear 
prediction  algorithm  (LPA).  The  new  OPDA  is  capable 
of  generating  much  more  superior  identification  results. 
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Abstract 

The  problem  of  separating  superimposed  digi¬ 
tally  modulated  signals  using  an  array  of  antennas 
is  considered.  The  proposed  method  exploits  the 
finite  alphabet  structure  to  demodulate  one  sig¬ 
nal  at  the  time,  resulting  in  a  computationally 
efficient  solution.  The  resulting  signal  estimates 
are  shown  to  be  exact  in  the  noise- free  case.  In 
noisy  scenarios,  the  performance  is  comparable 
with  that  of  the  recently  proposed  iterative  least 
squares  approach,  which  demodulates  all  signals 
simultaneously  at  a  higher  computational  cost. 


1  Introduction 

Array  processing  techniques  can  be  used  to  dis¬ 
criminate  between  spatially  separated  co-channel 
signals,  and  can  consequently  increase  the  capac¬ 
ity  in  wireless  communication  systems.  This  pa¬ 
per  discusses  how  to  reliably  demodulate  one  or 
more  desired  signals  of  interest  (SOI)  from  the 
output  of  an  array,  in  the  presence  of  other  co¬ 
channel  signals  and  noise.  Traditional  approaches 
exploit  the  spatial  structure  of  the  array,  and 
as  such  depend  on  high-resolution  estimates  of 
the  DOA’s  (Direction  Of  Arrival)  of  the  incom¬ 
ing  signals.  Since  modern  wireless  communica¬ 
tion  systems  are  characterized  by  a  highly  vari¬ 
able  propagation  environment,  this  spatial  struc¬ 
ture  is  poorly  defined  [3].  On  the  other  hand, 
these  methods  make  no  assumptions  about  the 
signals  themselves,  and  are  thus  not  exploiting 
the  structural  information  present  in  the  signals. 
Various  blind  copy  algorithms  have  been  proposed 
to  alleviate  this  problem  [1],[10].  The  referenced 
techniques  require  synchronized  signals  and  must 
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assume  flat  frequency  fading.  Generalizations  are 
considered  in  [2,  6,  9]  and  [10],  This  paper  pro¬ 
poses  a  new  approach,  based  on  decoupling  the 
estimation  problem  (i.e. treating  one  signal  at  a 
time).  This  leads  to  an  algorithm  with  similar 
or  better  performance  for  a  typical  scenario,  and 
furthermore  reduces  the  computational  cost  in¬ 
volved  in  the  estimation  procedure  significantly. 
These  claims  are  supported  by  simulation  results 
and  a  complexity  count.  Consistency  and  unique¬ 
ness  issues  are  also  addressed. 

2  Signal  Model 

With  d  syncronized  signals  arriving  at  an  m  el¬ 
ement  antenna  array,  the  complex  output  vector 
after  matched  filtering  and  symbol-rate  sampling, 
can  be  expressed  by  the  following  familiar  equa¬ 
tion 

x(n)  =  As(n)  +  v(n)  (1) 

where  A  is  the  collection  of  total  array  response 
vectors  (spatial  signatures),  scaled  by  the  signal 
amplitudes 

A  =  [piai  ...pdSLdl  (2) 

s(n)  =  [6i(n)...6rf(n)f,  6.(n)  =  ±1  (BPSK), 
and  v(n)  is  spatially  and  temporally  white  noise. 
For  simplicity  we  consider  BPSK  signals,  but  ex¬ 
tensions  to  arbitrary  linear  modulation  formats  is 
straightforward.  A  block  formulation  is  obtained 
by  taking  N  snapshots,  yielding 

X{N)  =  AS(N)-hV{N)  (3) 

whre  X(Ar)  =  [x(l) . .  .x(Ar)],  S(N)  = 

[s(l)...s(iV)],  and  y{N)  =  [y{l) . .  .x(N)].  The 
spatial  structure  of  the  data  is  represented  by  A, 
while  the  matrix  S  represents  the  temporal  stru- 
cure. 

By  defining  one  signal  (at  a  time)  to  be  the 
signal  of  interest  (SOI),  (3)  can  be  rewritten  in 
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the  following  way 


3J2  Consistency  and  Uniqueness 


X(N)  =  aisi +^a;s, +V(Ar) 

1=2 

=  aiSi+J(iV)  (4) 

where  the  first  signal  is  taken  to  be  the  SOI,  with¬ 
out  loss  of  generality.  The  term  J(iV)  thus  corre¬ 
sponds  to  interfering  signals  plus  noise. 

3  Decoupled  Symbol  Estimation: 

Algorithm 

Since  it  is  desired  to  estimate  the  signals  with 
little  or  no  spatial  knowledge,  the  idea  is  to  itera¬ 
tively  estimate  a  and  s,  based  on  the  formulation 
in  (4). 

3.1  Algorithm 

Given  an  initial  estimate  of  A  =  A  = 

[ai  . . .  a<j] ,  the  following  weighted  least-squares 
criterion  function  is  minimized 

min(X-as)*W(X-as)  =  min|lW-5(X-as)|p 

^  (5) 

Here,  W  should  ideally  be  chosen  as  [5], 
which  can  be  interpreted  as  a  prewhitening  of  the 
data  vector  x(n).  However,  it  can  easily  be  shown 
using  the  matrix  inversion  lemma,  that  using  the 
inverse  of  the  sample  estimate  of  the  covariance 
of  the  array  output  produces  asymptotically  (for 
large  N)  equivalent  signal  estimates.  Equation 

(5)  can  thus  be  reformulated  as  follows 

min||Z  —  bs|p  (6) 

b,s 

with  Z  =  Ri  ^X,  b  =  Rr^a,  and  Rx  = 
;^XX*  The  solution  to  (6)  w.r.t.  s  is 

i  =  (b-b)-'b-Z  =  Jj^b-Z  (7) 

Exploiting  the  finite-alphabet  property,  this 
solution  is  projected  onto  its  closest  discrete  val¬ 
ues  in  the  signal  space  (±1  ).  The  (modified) 
steering  vector  b  is  then  updated  by  minimizing 

(6)  w.r.t.  b.  The  solution  is 

b  =  Zs*(iiT^  =  ^  (8) 

Note  that  (8)  is  simply  a  temporally  matched 
filter  to  the  current  signal  estimate,  whereas  (7) 
represents  a  spatially  mathed  filter.  The  process 
is  repeated  until  s  converges,  after  which  the  al¬ 
gorithm  continues  with  the  next  signal. 


A  relevant  question  is  whether  or  not  the  al¬ 
gorithm  is  able  to  “capture”  the  transmitted  sig¬ 
nals.  Since  the  iterative  scheme  corresponds  to 
a  relaxed  optimization  procedure,  it  is  a  simple 
matter  to  show  that  it  is  guaranteed  to  converge 
to  a  local  minimum.  Whether  or  not  this  corre¬ 
sponds  to  a  “true”  minimum  depends  in  general 
on  the  initial  estimate.  However,  even  if  it  does, 
it  is  a  non-trivial  question  if  the  global  minimum 
yields  a  consistent  estimate  of  the  transmitted 
waveform.  Clearly,  this  is  possible  only  for  high 
enough  signal-to-noise  ratio  (SNR),  so  we  will  an¬ 
alyze  the  quality  of  the  global  minimum  assuming 
that  the  noise  variance  tends  to  zero. 

Substituting  the  solution  for  b  in  (8)  into  (6), 
gives  the  following  minimization  problem 

rnin||Z  -  ^Zs*s|p  =  niin  1|Z  -  Z^|i^  (9) 

Reformulating  in  terms  of  projection  matrices 

min  ||Z(I-Ps.  )ll^  max |p  max ||Zs* |p 
*  *  (10) 
where  the  last  equality  follows  since  for  BPSK 
signals  P,.  =  s*(ss*)-'^s  =  s's/N.  Furthermore, 
by  using  Z  =  R“^X,  the  following  can  easily  be 
derived 

max||Zs*|p  =  max{sPx*s*}  (H) 

8  ® 

Using  Schwartz  inequality, 

sPx-s*  <||s|p||Px*|l2  =  ^'  (12) 

with  equality  when  7^(s*)  C  72^(Px*)i  ” 

X*t^  for  some  column  vector  t^.  In  the  noise-free 
case  we  have  =  ASo,  giving 

=  (ASo)*t'  =  s;t .  (13) 

Thus,  the  signal  estimate  converges  to  a  linear 
combination  of  the  d  transmitted  signals.  Under 
suitable  “persistence  of  excitation”  conditions,  t 
must  contain  a  il  in  one  position  and  zeros  oth¬ 
erwise,  implying  that  si  is  one  of  the  true  trans¬ 
mitted  signals  with  a  possible  sign  change.  Specif¬ 
ically,  since  all  signal  vectors  sj  . .  .sj  are  treated 
likewise,  we  can  write  for  the  d  signal  estimates 

[sj  . .  .sj]  =  [sjj . .  .SqJT  .  (14) 

In  [10]  it  is  shown  that  the  above  can  hold  only 
if  T  is  a  diagonal  matrix  with  ±1  entries,  or 
a  permutation  matrix,  or  a  product  of  the  two; 
provided  that  the  columns  of  So  include  all  the 


137 


2^  possible  distinct  d- vectors,  with  ±1  elements. 
The  latter  is  a  mild  condition,  which  is  satisfied 
in  most  cases  of  practical  interest.  We  conclude 
that  the  global  minimizer  of  the  decoupled  crite¬ 
rion  function  converges  to  any  of  the  d  transmit¬ 
ted  signals  as  the  noise  variance  tends  to  zero. 

4  Performance  and  Complexity 

4.1  Performance 

Figure  1  below  shows  the  results  of  a  simula¬ 
tion  comparing  the  performance  of  the  proposed 
algorithm  with  that  of  ILSP  [8].  A  total  of  d  =  3 
signals  are  impinging  on  a  4-element  uniform  lin¬ 
ear  array  (ULA),  and  the  BER  vs.  SNR  is  evalu¬ 
ated.  A  5  bit  training  sequence  was  used  to  get  an 
initial  estimate  of  the  steering  vectors  A.  The  re¬ 
sults  clearly  show  that  an  improved  performance 
has  been  obtained  in  this  scenario,  (the  BER  of 
the  signal  with  DOA=106°  was  ~  0.25  for  both 
algorithms,  regardless  of  SNR). 


Figure  1.  Performance  of  decoupled  WLS 
approach  and  ILSP  algorithm.  Simulated 
data. 

The  algorithm  was  also  tested  on  a  real  dataset 
collected  at  the  University  of  Texas  at  Austin, 
and  compared  to  ILSP.  Two  closely  spaced  sig¬ 
nal  transmitted  bursts  of  198  symbols/burst  (still 
BPSK),  and  4  antenna  elements  was  used  at  the 
receiver.  Different  noise  realizations  were  then 
generated  and  added  to  the  data  in  order  to  eva- 
lute  the  BER  vs.  SNR  performance  (for  the 
strongest  signal  only).  The  results  are  given  in 
figure  (2),  and  demonstrates  that  the  two  algo¬ 
rithms  performs  similarly  in  this  scenario. 


Figure  2.  Performance  of  decoupled  WLS 
approach  and  ILSP  algorithm.  Real  data. 


In  general,  one  can  say  that  the  decoupled  al¬ 
gorithm  outperforms  ILSP  for  large  burst  lengths 
and  a  small  array.  The  explanation  for  this  is 
that  the  approximation  Rx  oc  RJ^,  used  in 
section  3.1  improves  with  larger  N  and  smaller 
m.  On  the  other  hand,  in  scenarios  with  m  d 
and  N  relatively  small;  it  is  our  experience  that 
ILSP  gives  a  slight  improvement  compared  to  our 
proposed  method. 


Figure  3.  Complexity  of  decoupled  WLS  ap¬ 
proach  and  ILSP  algorithm 


Complexity 

Since  the  proposed  algorithm  is  based  on  the 
same  iterative  approach  as  ILSP,  it  is  interest¬ 
ing  to  compare  the  complexity  of  the  two  meth¬ 
ods.  Before  the  iterative  estimation  of  s  and  b 
begin,  Rx  ^  and  the  product  Rx  must  be  com¬ 
puted.  This  requres  0{m^)  -h  m^N  flops.  Note 


138 


References 


that  this  computation  is  only  carried  out  once 
for  a  given  block  of  data  X(m|A^).  Looking  at 
eqn.(7),  it  is  sufficient  to  compute  the  product  of 
the  b*(l|m)-vector  with  the  modified  data-matrix 
Z{m\N),  requiring  2mN  flops  [4].  Similarly,  to 
update  the  b  estimate,  the  same  kind  of  product 
is  computed.  This  gives  a  total  of  4miV  flops  per 
iteration  and  signal.  A  similar  count  for  ILSP  re¬ 
sults  in  2Nmd+2d}{N  -  5)  +  m(f  flops  to  solve 
for  A  and  2Nmd+2dP{m- ^)  +  Nd^  flops  to  solve 
for  S  (both  per  iteration).  Consequently,  the  pro¬ 
posed  algorithm  results  in  a  significant  reduction 
in  computational  complexity  as  compared  to  the 
ILSP  algorithm. 

In  order  to  get  a  fair  comparison  of  complexity, 
one  should  also  look  at  the  convergence  properties 
of  the  two  algorithms.  The  number  of  iterations 
required  for  convergence  is  compared  in  figure  3 
(same  scenario  as  in  4.1).  Even  if  the  total  num¬ 
ber  of  iterations  for  the  proposed  algorithm  (add 
the  three  solid  lines)  exceed  that  of  ILSP  in  this 
case,  it  does  not  offset  the  large  difference  in  com¬ 
plexity  in  terms  of  flops  count. 

As  an  illustration,  the  following  typical  num¬ 
bers  were  obtained  using  a  flops  count  in  MAT- 
LAB  for  the  scenario  above  at  SNR  =  bdB:  ILSP 
requires  (w.  ~  10.4  iterations  on  the  average,  see 
figure  3) 

{10 Aiter){32650 flops /iter)  ~  339000 flops. 
The  decoupled  algorithm  requires  a  total  of 
(6552/lops/i<er)(6.2-f-6.5-t-4.2i<er)  ~  110700//ops 
in  addition  to  the  inversion  of  Rr  (only  once!). 

5  Conclusion 

The  simulation  results  indicate  that  the  pro¬ 
posed  algorithm  has  similar  or  improved  per¬ 
formance  compared  to  ILSP,  and  that  this  im¬ 
provement  is  accompanied  by  a  significant  reduc¬ 
tion  in  computational  complexity.  This  is  par¬ 
ticularly  notable  if  not  all  signals  are  of  inter¬ 
est.  The  method  can  be  extended  in  a  straight¬ 
forward  manner  in  order  to  include  the  case  of 
non-synchronized  users  and  time-dispersive  chan¬ 
nels,  using  e.g.  conventional  synchronization 
and  equalization  techniques  [7].  Simulations  per¬ 
formed  by  the  authors  (not  included  here  due  to 
space  limitations),  have  confirmed  this  claim. ^ 
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Abstract 

The  objective  of  this  paper  is  to  introduce  a  statis¬ 
tical  and  physically  based  mechanism  giving  rise  to  a- 
stable  noise  models.  We  show  that  the  additive  inter¬ 
ference  which  is  present  in  many  environments  can  be 
modeled  as  symmetric  a-stable  by  assuming:  (i)  inde¬ 
pendent  signaling  (effects)  from  a  large  number  of  in- 
terferers  of  the  same  type  (modulation);  (ii)  Poisson 
distribution  of  interferers  in  space;  and  (Hi)  inverse 
power  attenuation  of  signal  strength  with  distance.  Our 
approach  to  a-stable  noise  modeling  is  based  on  the 
LePage  series  representation  [5]  as  opposed  to  the  influ¬ 
ence  function  approaches  taken  in  [IfJS].  The  formulas 
derived  are  used  to  predict  noise  statistics  in  environ¬ 
ments  with  lognormal  shadowing  and  Rayleigh  fading. 
The  LePage  series  framework  allows  us  to  investigate 
practical  constraints  in  the  system  model  adopted,  such 
as  the  finite  number  of  interferers  and  nonhomogeneous 
Poisson  fields  of  interferers. 


1  Introduction 

The  characterization  of  the  corrupting  noise  distri¬ 
bution  is  an  important  requirement  for  most  system 
design  problems  because  it  leads  to  the  development 
of  noise  suppression  methods.  The  most  widely  used 
noise  model  is  the  Gaussian  random  process.  How¬ 
ever,  in  some  natural  environments,  the  Gaussian  noise 
model  may  not  be  appropriate.  This  is  evident  from 
a  higher  probability  of  large  amplitude  values  than  is 
consistent  with  Gaussian  distributions.  A  number  of 
models  have  been  proposed  for  such  impulsive  phe¬ 
nomena,  either  by  fitting  experimental  data  or  based 
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on  physical  grounds.  Recently,  it  has  been  suggested 
that  among  all  the  heavy-tailed  distributions,  the  fam¬ 
ily  of  stable  distributions  provides  the  most  accurate 
model  for  impulsive  noise  [1],[8].  In  communications, 
stable  noise  models  have  been  verified  experimentally 
in  various  underwater  communications  and  radar  ap¬ 
plications  [1],  [9].  Stable  distributions  share  defining 
characteristics  with  Gaussian  distributions,  such  as  the 
stability  property  and  the  generalized  central  limit  the¬ 
orem  (GCLT),  and,  in  fact,  include  Gaussian  distribu¬ 
tions  as  a  limiting  case.  Because  stable  distributions, 
except  for  the  Gaussian  case,  have  infinite  variance, 
at  first  sight,  it  appears  that  stable  noise  models  do 
not  have  the  wide  applicability  enjoyed  by  second-order 
processes.  However,  in  this  paper  we  present  a  realistic 
physical  mechanism  giving  rise  to  stable  noise.  We  do 
this  by  considering  the  nature  of  noise  sources,  their 
distributions  in  time  and  space,  and  propagation  con¬ 
ditions. 

2  System  Model 

In  our  system  model,  a  detector  is  located  at  the  cen¬ 
ter  of  a  plane  where  there  is  a  large  number  {N  ->  oo) 
of  transmitters  using  the  same  power  and  modulation. 
The  distances  between  the  detector  and  interfering  ter¬ 
minals  are  denoted  as  r*i,  where  ri  <  r2  <,*•*,<  rjv- 
In  general,  after  the  correlation  detection  of  passband 
interference,  the  interfering  signal  is  represented  as  an 
n-dimensional  vector  given  by 

N 

Y=  lim  ^a(ri)Xi,  (1) 

i=l 

where  a(r)  represents  the  signal  attenuation  over  dis¬ 
tance  r,  and  is  a  random  vector 

with  n  coordinates  XijJ  =  1,...  ,n  which  are  real 
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random  variables  (RVs).  The  jth  coordinate  of  Xj  is 
the  correlation  of  Xi  (t)  with  the  function  (pj  (t)  ^ .  In  this 
paper,  we  are  concerned  with  characterizing  the  distri¬ 
bution  of  Y,  and  in  order  to  do  this,  we  assume  that 
Xi  are  spherically  symmetric^  (SS)  RVs.  Because  all 
interfering  terminals  use  the  same  modulation  scheme 
and  transmit  at  the  same  power,  it  is  reasonable  to  as¬ 
sume  that  the  random  vectors  Xi  are  i.i.d.  Moreover, 
the  distribution  of  Xi  is  independent  of  n.  To  explain 
the  noise  modeling  in  (1),  it  is  useful  to  consider  a  sys¬ 
tem  with  on-off  frequency  shift  keying  (FSK)  and  non¬ 
coherent  detection.  In  this  system,  (pi{t)  =  cos(27r/ot) 
and  (p2{t)  =  sin(27r/ot),  and  the  projection  of  Xi{t)  onto 
results  in  Xi  =  [cos(0i),sin(0i)],  where 
0i  is  uniformly  distributed  in  (0, 27r].  This  means  that 
Xi  is  circularly  symmetric  (CS),  a  bivariate  case  of  a  SS 
vector.  With  respect  to  the  terminal  positions,  we  as¬ 
sume  that  terminals  form  a  Poisson  point  process  with 
the  expected  number  of  terminals  per  unit  area/volume 
given  by  A  [6]. 


3  Stable  Interference  Models  in  Envi¬ 
ronments  with  Deterministic  Propa¬ 
gation  Laws 


We  assume  initially  that  the  signal  amplitude  loss 
function  over  distance  r  is  given  by  the  following  de¬ 
terministic  propagation  law 

■=  w 

where  the  constant  K  depends  on  the  transmitted 
power.  The  attenuation  factor  m  can  vary  from  slightly 
more  than  1  for  hallways  within  buildings  to  larger  than 
3  for  dense  urban  environments  and  office  buildings. 
Combining  (1)  and  (2),  the  noise  equation  is 


Y 


(3) 


In  the  Appendix  A,  we  sketch  the  proof  of  the  following: 

Theorem  1  If  the  RVs  are  i.Ld.  and  SS  and 
the  interferers/scatterers  form  a  Poisson  field,  then 
the  characteristic  function  of  the  interference  vector  Y 
in  (3)  15  SS  a-stable,  i.e., 

(/iY(t)  =exp(-7l|t|l“),  (4) 

^The  projection  of  a  continuous-time  waveform  transmitted 
by  the  i-th  terminal  Xi{t)  onto  (pj{t),  or  equivalently  the  corre¬ 
lation  of  these  two,  is  given  as  Xij  =  Jq  <Pj{t)xi{t)dty  where  T 
is  a  symbol  interval. 

^The  random  vector  X  is  said  to  be  SS  if  its  characteristic 
function  ^x(^)  depends  only  on  the  Euclidean  (i>2)  norm  of  t, 

i.e.,  ^x(t)  =  <^>(||t||)  ,  where  ||tl|  = 


where  ck  =  ^  ^  for  interferers  distributed  in 

the  plane  and  volume,  respectively.  The  parameter  7, 
called  dispersion,  is  given  as 

7  =  -APK“  r  ^^dx,  (5) 

Jo  ^ 

or  equivalently  as 

7  =  XVK^^C-^E  I  Xij  r,  (6) 

where  $0(2^)  =  ^x(||t|l)  is  a  generating  characteristic 
function  of  the  SS  RVs  X,  ^  '  denotes  differentiation, 

=  r(2-a)c~oV<^72y  constant  V  =  n  for 
interferers  in  the  plane,  and  7^  =  |7r  for  scatterers  in 
the  volume. 

For  Xi  =  [cos(0i),sin(0i)]  (non-coherent  on-off  FSK), 
with  0i  uniformly  distributed  in  (0,27r],  ,we  have 
$o(a:)  =  Joix),  where  is  a  i/th  order  Bessel  func¬ 
tion  of  the  first  kind  [4].  This  model  for  X,  is  assumed 
in  many  radar  applications.  Because  Jo{x)  =  -Ji{x), 
the  formula  6.561.17  from  [2]  can  be  used  in  (5),  to  cal¬ 
culate  that  the  dispersion  of  the  SS  stable  RV  Y  in  (3) 
with  the  deterministic  power  propagation  law  as  in  (2) 
is  given  for  0.5  <  a  <  2  by 

—  WK°‘  ~  (7') 

7determ  -  XV K  +  a  12)'  ^  ^ 

In  this  equation,  the  admissible  range  of  the  path  loss 
exponent  is  1  <  m  <  4  for  interferers  distributed  in  the 
plane,  and  §  <  m  <  6  for  scatterers  distributed  in  the 
volume. 


4  Stable  Interference  Models  in  Log¬ 
normal  Shadowing  and  Rayleigh  Fad¬ 
ing  Environments 


So  far  we  have  assumed  that  the  received  signal 
strength  decreases  with  range  raised  to  some  exponent. 
However,  experimental  results  show  that  this  is  only 
the  average  behavior  of  the  signal.  The  received  signal 
at  fixed  range  is  not  constant  because  of  different  ter¬ 
rain  characteristics  and  statistical  fluctuations  in  prop¬ 
agation  conditions.  Typically,  the  following  random 
effects  should  be  included  in  a  study:  (i)  the  random 
link  attenuation  due  resulting  from  lognormal  shadow¬ 
ing  and  (ii)  Rayleigh  fading. 

In  the  presence  of  lognormal  shadowing,  the  pdf  of 
the  signal  strength  is  of  the  form  [3] 


p{a{r)\a{r)) 


(8) 


^Here,  $o(a;)  is  a  function  of  the  scalar  variable  x  =  ||t|l, 
which  for  a  SS  RV  X^  does  not  depend  on  n. 
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where  a(r)  =  ^  is  the  median  of  a(r)  as  given  in  (2) 
and  (7  =  0.50-5.  The  parameter  erg  is  the  standard  de¬ 
viation  of  the  instantaneous  power,  and  it  depends  on 
the  environment.  Values  of  C75  on  the  order  of  8  to  10 
dB  are  reported  in  the  literature  [3].  So  in  order  to 
include  the  lognormal  shadowing  effect  in  our  model, 
we  have  to  consider  a{r)  in  (1)  to  be  a  RV  given  as 
^  exp  (crG),  where  G  is  the  standard  Gaussian 
RV  with  zero  mean  {G  ~  A/"(0,1)).  The  interfering 
signal  is  then 

00 

=  (9) 

i=l  » 

We  assume  here  that  Gi  are  i.i.d.  The  hypothesis  of 
independence  between  shadowing  effects  from  different 
users  is  generally  accepted  [4].  Therefore,  we  can  apply 
Theorem  1  in  (9)  with  replaced  by  exp{aGi)Xi  ^ 
Then,  in  environments  with  lognormal  shadowing,  Y 
is  again  a-stable  with  a  =  ^  and  a  =  ^  for  interferers 
distributed  in  the  plane  and  volume,  respectively.  To 
calculate  the  dispersion,  we  use  (6) 

7  -  ^K-VC-^E\exp{aGi)Xij\- 

-  XK^rC-^E  I  Xij  1“  E  \  exp{aGi)  (10) 

=  7determexp(|a:2^2^^ 

where  7determ  is  a  dispersion  of  the  corresponding  sys¬ 
tem  with  the  deterministic  power  propagation  law.  The 
last  equation  in  (10)  follows  from  the  first  moment  re¬ 
lation  for  lognormal  RVs. 

If  a{r)  is  Rayleigh  distributed,  for  a  given  r,  a{r)  can 
be  represented  [3]  as  o(r)  =  where  the  RV 

^  +  G'q  is  Rayleigh  distributed  with  Gi,Gq  ~ 

V(0, 1).  Then,  we  have  to  substitute  yJ^TZiXi  for  X, 
in  Theorem  1,  and  Y  is  a-stable  with  the  same  char¬ 
acteristic  exponent  as  in  the  deterministic  power  prop¬ 
agation  scenario.  The  dispersion  is  calculated  in  the 
same  fashion  as  in  (10).  Because  E\n\^  =  2f  r(H-  f ), 
the  dispersion  is 

7  ^  7deterni(^)  ^  1^(1 -h  ^).  (H) 

The  dispersion  factors  for  lognormal  shadowing, 

Rayleigh  fading  and  combined  shadowing  and  fading 
are  shown  in  Fig.  1  as  a  function  of  a.  The  curves  are 
plotted  with  the  shadowing  standard  deviation  ag  = 
10  dB.  We  see  that  in  all  cases  examined,  the  dispersion 
factors  are  increasing  functions  of  a. 

^The  RVs  exp(aGi)X,  are  spherically  symmetric  (SS)  be¬ 
cause  a  product  of  a  univariate  RV  and  a  SS  RV  is  SS.  Also, 
they  are  independent  because  {Gi}  and  {Xi}  are  assumed  to  be 
independent  sequences  of  mutually  independent  RVs. 


a 


Figure  1.  Dispersion  factor  for  Rayleigh  fading  and 
lognormal  shadowing  (c75  =  10  dB). 


5  Practical  Considerations 

In  Section  2,  we  have  made  two  idealized  assump¬ 
tions:  (i)  we  assumed  an  infinite  number  of  interfer¬ 
ers;  and  (ii)  we  assumed  that  an  interfering  signal  was 
present  for  the  entire  duration  of  the  matched-filtering 
interval.  In  [3],  we  show  that  these  assumptions  do  not 
constrain  our  analysis.  Moreover,  we  demonstrate  that 
the  a-stable  model  applies  when  interferers  form  non- 
homogeneous  Poisson  fields.  The  last  result  is  achieved 
by  mapping  the  processes  in  the  plane  (volume)  into 
homogeneous  processes  on  the  line.  This  is  because 
the  LePage  series  representation  applies  only  to  Pois¬ 
son  processes  with  a  constant  rate.  For  example,  if  the 
non-homogeneous  point  processes  in  the  plane  has  the 
rate  function  A(r)  =  Aor^”^  and  /?  <  2m,  then  rf  rep¬ 
resents  the  homogeneous  Poisson  process  with  the  rate 
^  =  Ao^.  With  this  result,  we  can  proceed  as  in  Sec¬ 
tion  2  and  arrive  at  the  stable  model  with  a  =  ^  and 

7  =  -Xo^K^  Also,  if  we  assume  that 

the  interferers  are  Poisson  distributed  only  in  a  sector 
of  the  plane  with  an  angle  and  that  their  density  is 
A,  then  we  can  map  such  a  process  to  a  homogeneous 
Poisson  point  process  in  the  whole  plane  with  the  rate 

A*  =  A^^27r^^  latter  scenario  is  applicable 

to  directional  antennas  as  opposed  to  omnidirectional 
discussed  so  far. 

6  Concluding  Remarks 

In  this  paper,  we  have  characterized  interference  for 
multiple  access  communication  systems  in  which  in¬ 
terferers  are  assumed  to  be  Poisson-distributed  in  the 
plane.  The  same  development  applies  to  radar  clutter. 
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Assuming  the  average  inverse  power  attenuation  of  sig¬ 
nal  strength  over  distance,  interference  in  the  system  is 
shown  to  be  an  SS  a-stable  noise.  This  model  specifies 
system  noise  with  two  parameters:  the  characteristic 
exponent  a  and  the  dispersion  7.  The  formulas  de¬ 
rived  in  the  paper  allow  us  to  predict  noise  statistics  in 
environments  with  lognormal  shadowing  and  Rayleigh 
fading. 

The  hypothesis  of  a-stable  noise  is  partially  confirmed 
by  the  impulsive  character  of  clutter  and  multiple  ac¬ 
cess  interference.  But  in  the  end,  it  must  be  resolved 
against  experimental  data.  Alpha-stable  noise  model 
verification  in  radar  applications  is  currently  underway 
and  the  results  will  be  announced  shortly. 


Appendix  A 


Our  proof  of  Theorem  1  is  based  on  the  generalized 
LePage  series  representation  of  SS  a-stable  distribu¬ 
tions: 


Theorem  2  Let  {rj}  denote  the  “arrival  times”  of  a 
Poisson  process^  with  rate  A  and  let  {Xi}  be  SS  i.i.d. 
vectors  in  satisfying  <  00,  or  equivalently 

E\Xij\^  <  00.  Then 

00 

Y  =  ;^r-iXi  (12) 

i=l 

converges  a,s.  to  a  SS  a-stable  random  vector  Y  with 
the  characteristic  function  (ch.f.) 

^Y(t)  =exp(-7||t||“).  (13) 


The  dispersion  parameter  7  is  given  as 


7  =  -A 


^o(^) 


dx. 


(14) 


The  detailed  proof  of  Theorem  2  can  be  found  in  [3]. 
To  link  the  multivariate  version  of  the  LePage  series 
in  Theorem  2  with  the  noise  equation  in  (3),  we  need 
to  map  a  Poisson  point  process  in  the  plane  (volume) 
onto  the  homogeneous  Poisson  process  on  the  line.  To 
achieve  this,  we  use  the  following  proposition  which  re¬ 
sults  from  the  mapping  theorem  of  Poisson  point  pro¬ 
cesses  [6]: 

Proposition  1  For  a  homogeneous  Poisson  point  pro¬ 
cess  in  the  plane  (volume)  with  the  rate  A,  assuming 
that  points  are  at  distances  ri  (ri  <  r2  <  •’ ' )  from 
the  origin,  Ti  =  rf  (Ti  =  rf)  represents  Poisson  ar¬ 
rival  times  on  the  line  with  the  constant  arrival  rate 
ttA  ("IttA/ 


Hn  this  paper,  we  use  the  term  arrival  times  or  occurrence 
times  of  a  Poisson  process  to  mean  a  Poisson  process  on  the  line, 
where  time  is  just  a  hypothetical  variable. 


Now,  for  interferers  distributed  in  the  plane,  we  rewrite 
Y  in  (3)  as 


i=l 


(15) 


Prom  Proposition  1,  P*  =  rf  represents  Poisson  “oc¬ 
currence”  times  on  the  line  with  the  arrival  rate  ttA, 
and  based  on  Theorem  2  Y  is  SS  a-stable  with 
the  characteristic  exponent  oc  =  ^  and  dispersion 

j  =  —  AttAT"  ^^^dx.  The  multiplicative  constant 

K  changes  the  dispersion  of  a-stable  RV  by  [7], 
Similar  proof  follows  for  interferers  distributed  in  th 
volume.  The  equivalence  of  Eqs.  (5)  and  (6)  follows 
from  the  integral  formula  ([2],  3.823) 


—  cos{zt) 


dt  =1  z 


Ot 


r(l-a)  cos(fa) 
a 


(16) 


by  replacing  the  constant  z  with  RV  Xij  and  taking 
expectation  of  both  sides. 
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Abstract 

We  study  the  Fractionally-Spaced  Equalization  by 
CMA  (FSE-CMA)  robustness  to  channel  noise  and  lack 
of  disparity.  When  there  is  lack  of  disparity,  we  will 
show  that,  whereas  other  recent  technics  as  linear  pre¬ 
diction  or  subspace  like  methods  fail,  FSE-CMA  can 
still  equalize.  In  particular  for  long  enough  equalizer 
FSE-CMA  exhibits  a  ^smoothing  effect”  which  leads  to 
an  interesting  trade-off  between  achieving  zero-forcing 
equalization  and  noise  enhancement. 

1.  Introduction 

Constant  Modulus  Algorithm  (CMA)[1],  is  one  of 
the  most  commonly  used  blind  algorithm  to  suppress 
InterSymbol  Interference  (ISI)  in  digital  transmission 
systems.  It  is  called  FSE-CMA  (Fractionally- Spaced 
Equalization  by  CMA)  when  used  in  a  channel  di¬ 
versity  scheme  generated  by  either  oversampling  the 
received  data  or  multivariate  data  observed  behind 
a  sensors  array.  In  a  previous  work  ([2]),  it  has 
been  shown  that  the  FSE-CMA  criterion  minimization 
achieves  perfect  equalization  (in  the  noise-free  context) 
under  the  so-called  Zero-forcing  conditions  (no  com¬ 
mon  zero  in  the  multichannel  transfer  function,  i.e., 
co(^)  =  l  in  Figure  1  and  a  long  enough  equalizer)  ([2], 
[3]).  Moreover,  in  the  contrary  of  the  second-order 
statistics  based  methods  ([4],  [3],  [5],  [6]...),  FSE-CMA 
still  performs  reasonable  equalization  even  when  there 
is  lack  of  channel  disparity  (i.e.,  co{z)  ^  l)  (see  the 
noise-free  preliminary  study  [2],  for  instance).  Further¬ 
more  we  have  shown,  in  a  previous  study,  that  under 
ZF  conditions  FSE-CMA  exhibits  some  robustness  to 
channel  noise  ([7]). 

In  this  contribution  we  are  motivated  by  the  desire 
to  evaluate  the  FSE-CMA  global  performance  criterion 
in  realistic  noisy  conditions.  So,  we  address  the  effect 


of  additive  white  noise  and  lack  of  channel  disparity  on 
the  FSE-CMA  criterion,  in  terms  of  the  input-output 
remaining  mean  square  error.  This  will  also  allow  to 
define  an  equalizability  bound  that  will  permit  to  com¬ 
pare  the  optimal  FSEI-CMA  performance  to  other  re¬ 
cent  Fractionally-Spaced  Equalization  technics. 

2.  FSE  under  lack  of  disparity 

Under  lack  of  disparity,  we  consider  the  Fractionally- 
Spaced  model  driven  by  a  zero-mean  i.i.d.  sequence 
5(n)  and  corrupted  by  an  L-dimensional  additive  noise 
«;(n)  =  (wJi («),••.,«;£ (n))"^  (Figure  1). 


u;i(n) 


WL{n) 

Figure  1.  FSE  Scheme  Under  Lack  of  Disparity 

The  linear  equalization  problem  consists  on  choosing 
the  ”best”  L- variate  Finite  Impulse  Response  (FIR) 
equalizer  transfer  function  e(z),  of  degree  N,  such  as 
j/(n)«s(n-i/),  with  i/  is  an  arbitrary  delay,  (each  ek{z) 
writes  as  ek{z)  =  Y!,p=o^k,p  The  channel  is  de¬ 

scribed  by  co{z)  a  possibly  non-minimum  phase  scalar 
transfer  function  of  degree  Zq  and  c{z)  an  L-variate 
FIR  non-reducible  vector  transfer  function  (i.e.,  there 
is  no  common  zero  to  all  components  c^(^)  of  degree 
(Q-^o)). 

This  problem  formulation  is  turned  on  choosing  the 
NL  long  equalizer  impulse  response  e,  such  as: 

!/(«)=  (e'^CCo)  S{n)  +  g^W{n)  w  s{n  -  u)  (1) 
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where  S{n)  contains  the  last  (iV-f  Q)  input  symbols 
in  the  past  of  s(n)  (with  N-l>  Q)  and  W{n)  is  the 
multivariate  noise  vector  (ti5{n),  ♦  •  • ,  w{n  -  iV  +  1))^- 
Hence,  Co  denotes  the  {Q-Zo-^N)  x  (AT  +  Q)  channel 
convolution  matrix  associated  to  co(z)  and  C  denotes 
the  (ATL)  X  (Q-Zo+N)  channel  convolution  matrix  as¬ 
sociated  to  the  multichannel  minimum  phase  transfer 
function  c{z).  Note  that,  C  and  Co  are  respectively  full 
column-rank  and  full  row-rank  Sylvester  matrices  ([2]). 

3.  Smoothed  FSE-CMA  criterion 

Lemma  1  The  FSE-CMA  normalized  criterion  ([!]) 

M^  =  E[{r2-y\n)f]/E[s^Y  (2) 

with  r2  =  E[s'^]/E[s^],  and  E[.]  stands  for  the  mean 
expectation  operator.  Assuming  independence  between 
the  noise  and  source  signals  and  a  temporally /spatially 
white  gaussian  noise,  we  get: 

J^(e)  =  Jo  (e-)  +  T  II  Sf  [2  (3l|  Cj C^e  ||^  -p)  +37 1|  e  ||"}  (3) 

where  Jo(^  the  noise-free  cost- function,  7  is  the 
noise  to  signal  power  ratio  writes  as  ,  7  = 
and  p  =E[s^]/£’[s^]^  is  the  input  signal  kurtosis. 

Proof:  Using  (1),  the  proof  is  deduced  by  a  straightforward 
calculus  (see  [2]).  Note  that  the  expression  can  be  easily 
extended  for  a  non  gaussian  noise  since  we  know  the  fourth- 
order  moment  of  the  noise  contribution.  □ 

3.1.  Further  results  under  lack  of  disparity 

The  channel/equalizer  impulse  response  setting  ze¬ 
roing  the  noise-free  cost-function  Jo(^  writes  as: 

hi,  =  C^C^e  ^  c{z)'^e{z)  =  /co{z)  (4) 

which  is  not  possible  with  a  FIR  equalizers.  Because  of 
lack  of  disparity,  the  best  achievable  h  may  be  far  from 
any  optimal  setting  hu  =  (0  •  •  •  010  •  •  •  0)^.  More  pre¬ 
cisely,  the  only  achievable  impulse  responses  ft  =  C  J  e 
live  in  the  subspace  spanned  by  the  columns  of  Cj  •  In 
particular,  the  closest  to  hjy  achievable  ft  is  given  by 
the  orthogonal  projection  of  hi,  on  the  range  of  Cj, 
/i  =  Cj(CoCj)-iCo/i^,  We  set  Ho  =  Cj(CoCj)-iCo. 
In  fact,  for  a  given  achievable  ft  =  Cje  there  exists  a 
unique  e  such  as  e  =  C^  and  NL  — (A/  +  Q  — Zo)  pos¬ 
sible  settings  for  e.  In  this  case,  the  cost-function 
extrema  of  Jo(^e)),  satisfy  CoA(Cj^e)Cje  =  0,  with 
A(C^e)  =  (3||Co’ell^-p)/- (3-p)diag(Co’ee'^Co)  ,  where 
diag{A)  stands  for  the  matrix  extracted  from  A  with 
the  same  diagonal  entries  and  0  elsewhere.  They  can 
be  classified  as  ([2]): 

•  one  maximum  (e  =  O) , 


•  global  minima  (when  Cje  —  hu  is  achievable  with 
e^O)  or  saddle  points  (A(Cj^ e)  Cj^e  =  0  and  Cje^ft^), 

#  local  minima  (C^A(C^e)  C^e  =  0  and  e  does  not 
belong  to  the  previous  categories) . 

Note  that,  a  potential  global  minimum  imply  that 
the  corresponding  e  is  expressed  as  (CoCq  )~^Cofti/, 
i.e.,  ft  =  Ilofti..  Since  Co  can  not  be  square  (it  is  a 
(AT  +  Q-Zo)  X  (AT  +  Q)  matrix),  there  should  exist 
no  global  minima  such  as  ft  =  ft^.  However,  when  N 
becomes  ^4arge”,  Co  tends  to  become  square  so  that 
IIo  becomes  close  to  the  identity  matrix.  Of  course,  as 
in  the  non-fractional  case,  undesired  settings  may  exist. 
However,  the  larger  N  is,  the  closer  the  corresponding 
channel  /  equalizer  is  becoming  to  some  hi^. 

3.2.  Perturbation  in  noisy  case 

From  lemma  1,  one  can  see  in  noisy  context,  that 
Jo(^  is  regularized  by  an  additional  deterministic  fac¬ 
tor  ^7(e(e))  driven  by  7.  This  leads  to  a  balance 
between  the  minimization  of  criterions:  Jo  sirid  ^7- 
The  result  is  a  “smoothing  effect”  expressed  through 
twofolds  constraints:  (i)  a  minimization  of  ||  Cj C^e  |p 
that  leads  to  get  an  impulse  response  ||ft||  as  small  as 
possible;  (ii)  a  minimization  on  |1^|  which  tends  to  for¬ 
bid  the  equalizer  norm  to  be  too  high,  reducing  conse¬ 
quently  the  noise  enhancement  (see  (1)). 

Thus,  the  minima  of  realize  a  desirable  bal¬ 

ance  between  the  noise-free  good  equalization  settings 
and  the  noise  enhancement  due  to  the  equalizer  norm. 
To  solve  the  minimization  problem,  we  propose  a  two- 
steps  minimization  procedure.  First,  we  minimize 
J'y(^  over  the  subspace  of  vectors  e  such  as  e  = 
for  a  given  e.  The  resulting  value  of  is  a  function  of 
e,  denoted  e(e).  Then,  we  minimize  J'y{e{e))  over  the 
subspace  of  (AT+Q  — Zo)-long  vectors  e.  Invoking  equa¬ 
tion  (3),  the  two  steps  minimization  can  be  expressed 
as 

min  J'y(e(e))  =  min{  Jo(^e))  -h  7  min  {^-y(e(e))}}  (5) 

The  procedure  is  simplified  because  J(^  is  a  function 
of  e  only,  so  that  the  first  step  consists  of  the  smoothing 

cost-function  ^^(e)  minimization  only. 

The  first  step  minimization  of  the  quadric  cost- 
function  ^  given  e),  under  the  linear  con¬ 

straint  e  =  C^c,  can  be  performed  using  Lagrange  mul¬ 
tiplier  technic  and  leads  to  the  zero-order  approxima¬ 
tion  (for  a  SNR  large  enough) : 

e(e)  =  C(C'^C)-'e  +  o(l) 

The  second  step  consists  of  minimizing: 

A(e(e))  =  Jo(e(e))  +  7  ||e(e)l|^(3||Me)|P  -  p)  +  0(7)  (6) 
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where  the  noise-free  cost-function  Jo(e(e))  =  E[{r2  - 
(e*'’Co5(n))^)^].  Taking  the  derivative  of  (6)  with  re¬ 
spect  to  e,  the  extrema  of  (3)  are  solution  of  the  equa¬ 
tion: 

4CoA(Cj'e)Cj’e  +  2  7  [3  ||C(C'^C)-'e|f  CoCj' e 

+  (3||Cj'e||^-p)(C^C)-ie]+o(7)  =  0  (7) 

Our  task  is  to  provide  a  close-form  solution  to  the 
equation  (7).  Since  we  don’t  know  how  to  explicit  the 
noise-free  minima  expression,  we  consider  herein  the 
perturbation  of  e  =  ^  =  (CoCj)"^Co/ii/,  correspond¬ 
ing  to  h  =  hi,.  We  know  that  for  a  "large  enough" 
equalizer  length  it  will  be  a  good  approximation  in  the 
noise-free  case. 

4,  Close-form  extrema 

In  order  to  get  some  insight  in  the  noisy  case,  we  as¬ 
sume  that  the  approximation  error  is  "smaller"  than 
the  perturbation  due  to  the  noise.  This  should  hold 
for  "large"  values  of  N  and  "not  too  small"  values  of 
7.  In  the  same  time,  7  must  be  small  enough  to  allow 
a  first  order  approximation  in  terms  of  7.  The  valid¬ 
ity  of  this  assumption  is  checked  by  simulations  in  the 
sequel. 

Proposal  1  For  a  small  enough  7,  we  assume  the 
global  channel’ equalizer  setting  e^  to  be  a  first  order 
'perturbation  of  e^  =  (CoCj)“^Co/ii/  in  terms  0/7  as^ 

=eu+7l^+o(7)  (8) 

Then,  satisfies, 

1.  «  ~  eJ(C^C)-'e^  CoCj' 

(9) 

The  corresponding  channel  /  equalizer  settings  can  be 
viewed  as  a  perturbation  of  hi,, 

hy  =  Cje^  W  [3eJ(C^C)“^e>^ 

+(3  -  p)Co"^(C^C)-‘Co"^ft.]  +  0(7)  (10) 

where  is  a  (iV+Q)  x  diagonal  matrix  with 

entry  (3  — p)  when  +  l  and  2p  when  2  =  and 

Co"  =  Co^(CoCj')-‘. 

Note  that,  for  a  large  value  of  N,  the  symbol  «  in 
(9)  stands  for  the  approximation  of  ^  hi,.  If  we 
assume  in  addition  that  the  input  is  constant  modulus, 


p—l  and  =  27,  the  global  impulse  response  minima 
are  of  the  form: 

h-y  a  [3  (hjcf  K 

+  Co^(C^C)-'Co"^h.]  +  0(7)  (11) 

This  result  is  similar  to  the  expression  of  when  ZF 
is  exactly  achievable  (i.e.,  when  (C^C)^^  is  replaced 
by  We  notice  once  again  that  FSE- 

CMA  criterion  has  very  specific  properties  for  constant 
modulus  input  signals. 

Proof  of  Proposal  1;  Introducing  assumption  (8)  in 
the  equation  (7),  the  proof  consists  on  evaluating i.e., 
the  first  order  solution  (in  terms  of  y)  of  the  equation  (7). 
Since  Cq  «  hi,  for  a  large  enough  N,  we  obtain  eaS’ 
ily  as  a  solution  of  the  linear  system:  2  Co^t/Cje^  = 
-3e:(C^C)-ie,CoCj'e,  +  (3  -  p)(C^C)"'e,  +  o(l). 
Where  Co^i/C^  *5  invertible  if  the  input  signal  is  not  gaus- 
sian  (i.e.,  p^3^.0 

Simulations:  A  2-dimensional  multichannel  vector  c{z) 
is  defined  by  the  zeros  of  each  transfer  fimction  as  c\[z)  = 
(—1.4,  —0.4)  and  C2{z)  —  (1.1,  —0.4).  The  observation  num¬ 
ber  is  set  to  N=8.  Figure  2  displays  the  impulse  response 
taps  of  ^7  (obtained  by  running  the  cdgorithm  to  minimize 
the  criterion)  versus  SNR.  Note  that  h~^  is  very  close  to  a 
canonical  vector  for  a  large  enough  N  and  SNR.  In  Figure 
3,  we  display  the  analytical  impulse  response  introduced  in 
Proposal  1.  We  can  see  that  both  ciuves  are  very  close. 


h  v«fsus  SNR 


AruiHtycal  h  vsraua  SNR 


Figure  3:  analytical  h  versus  SNR 
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5.  Mean  square  input/output  error 

In  this  subsection  we  are  motivated  in  evaluating  the 
mean  equaliz ability  performance  in  terms  of  normal¬ 
ized  input/output  mean  square  error  (MSE)  defined  as 
E[(y(n)-s(n-iv))^]/£;[s^]. 

Proposal  2  The  MSE  is  as  a  sum  of  residual  ISI  and 
noise  enhancement  (driven  by  j).  For  a  large  enough 
SNR  the  FSE-CMA  MSE  can  be  approximated  by: 

11(7  -  no)h^|p  +7  hfCoiOjCy^C^'^h^  +0(7)  (12) 

' - - - '  ' - - - — ' 

Zero— Forcing  Noise  Enhancement 

The  equalizability  bound  is  expressed  as  the  sum  of 
an  irreducible  error  due  to  the  pseudo-inversion  of  co(^) 
and  a  linear  error  proportional  to  7.  Note  that  for  a 
long  enough  equalizer,  IIowI  ,  then  the  MSE  is  mostly 
due  to  noise  enhancement. 

Proof  of  Proposal  2:  The  MSE  writes  as  MSE 
=  \\h  —  hix\\'^  +  7  ||<^|^*  Introducing  the  parametrization 
h  —  Cje  and  the  assumption  (8),  we  get  =  e(e^)  = 

C(C'^C)-i(CoCj')-'Coh.  -I-  0(1)  =  C(C'^C)-‘e,  -h  o(l), 
and  h-y  -  Cj -f  o(l)  which  yields  immediately  to 
(12)  □. 

An  interesting  point  is  to  notice  that  the  first-order 
FSE-CMA  MSE  is  the  same  as  the  MMSE  deduced 
by  minimization  of  E[(y(n)  —  s(n  —  even  if 

the  channel  equalizer  global  impulse  response  minima 
differ  between  criterions. 

Simulations:  We  use  a  2-dimensional  multichannel  vec¬ 
tor  c(z)  is  defined  by  the  zeros  of  each  tramsfer  function  as 
(-1.4, 1.1)  and  we  take  co{z)  =  z-\-0A.  The  observations 
number  is  set  to  AT  =  8  (Figure  4)  cind  iV  =  2  (Figure  5). 
Both  curves  show  the  accuracy  between  the  experimental 
cind  the  analytical  FSE-CMA  MSE  (12).  In  Figure  4,  N  is 
long  enough  to  have  to  have  ||(/  —  no)h,/||^  «0.  In  Figure  5 
the  analytical  curve  (-)  is  the  sum  of  the  experimental  irre¬ 
ducible  zero-forcing  (.— )  and  the  linear  Noise- Enhancement 
error  ( - ). 


6.  Conclusion 

Under  the  realistic  assumptions  of  lack  of  channel  dis¬ 
parity  and  additive  channel  noise,  we  have  established 
in  this  contribution  an  analytical  close-form  of  the 
FSE-CMA  global  impulse  response.  Whereas  other  re¬ 
cent  second-order  methods  fail,  we  have  shown  that 
FSE-CMA  realizes  an  interesting  trade-off  between 
noise-enhancement  and  achievable  equalizability.  In  or¬ 
der  to  evaluate  the  equalizability  performance  a  close- 
form  expression  of  the  mean  input /output  steady-state 
square  error  has  been  derived.  For  large  SNR  value 
and  large  N,  FSE-CMA  performances  are  very  similar 
to  the  best  achievable  performances  of  a  blind  linear 
equalizer. 
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ABSTRACT 

In  this  paper,  a  blind  adaptive  beamforming  algorithm 
is  presented  which  improves  the  performance  of  CAB 
[4].  Noting  that  the  weighting  vectors  of  CAB  are  not 
in  general  proportional  to  signal  steering  vectors  in  the 
case  of  multiple  signals,  a  singular  vector  rotation  tech¬ 
nique  is  used  to  iteratively  estimate  the  steering  vec¬ 
tors.  Using  the  estimated  steering  vectors  as  the  con¬ 
straint  matrix  in  the  LCMV  algorithm  [2],  better  inter¬ 
ference  suppression  is  achieved.  Computer  simulations 
are  conducted  to  demonstrate  that  the  performance  of 
the  proposed  algorithm  is  superior  to  that  of  CAB  for 
the  scenario  of  multiple  co-channel  users  transmitting 
at  the  same  frequency. 

1.  INTRODUCTION 

The  use  of  multiple  high  gain  agile  beams  from  a  multi¬ 
ple  element  array  antenna  with  on-board  digital  beam¬ 
forming  [1]  is  being  considered  in  the  next  generation  of 
mobile  satellite  communication  systems  (MSCS).  The 
main  advantages  of  the  system  are  that  it  offers  a  flex¬ 
ible  solution  for  channel  allocation  and  it  can  actively 
suppress  co-channel  interference.  Active  interference 
suppression  can  be  achieved  by  using  on-board  adap¬ 
tation.  The  Linear  Constrained  Minimum  Variance 
(LCMV)  algorithm  seems  to  be  the  most  suitable  adap¬ 
tive  beamforming  method  for  the  multiple  agile  beam 
MSCS  [8].  The  LCMV  method  requires  the  locations 
of  mobile  users  in  order  to  steer  the  high  gain  beam 
towards  the  desired  users  and  place  the  null  at  spe¬ 
cific  co-channel  interferences.  Mobile  user  localization 
can  be  established  by  on-board  processing  using  high- 
resolution  techniques,  which,  however,  can  be  very  com¬ 
putationally  intensive  and  calibration  of  the  array  is 
necessary.  On  the  other  hand,  blind  adaptive  beam- 
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forming  methods  exploiting  the  cyclostationarity  [3]  of 
communication  signals  attract  attention  because  of  its 
advantages  of  no  requirement  for  mobile  localization 
and  no  need  for  array  calibration. 

The  cyclic  adaptive  beamforming  (CAB)  algorithm 
[4]  being  one  of  the  blind  adaptive  beamforming  meth¬ 
ods  has  been  proposed  as  a  good  candidate  for  spa¬ 
tial  re-use  of  frequency  spectrum.  However,  the  per¬ 
formance  of  CAB  deteriorates  when  multiple  desired 
signals  are  present.  Here,  an  improved  CAB  algorithm 
is  proposed  which  can  iteratively  generate  a  better  es¬ 
timation  of  the  steering  vectors  of  multiple  signals  than 
CAB  does.  Using  the  estimated  steering  vectors  as  the 
constraint  matrix  in  the  LCMV  algorithm,  better  inter¬ 
ference  suppression  is  achieved.  Computer  simulations 
are  conducted  to  demonstrate  the  performance  of  the 
proposed  algorithm. 

2.  BLIND  ADAPTIVE  BEAMFORMING 
ALGORITHM 

The  basic  idea  of  CAB  is  to  formulate  the  cyclic  (con¬ 
jugate)  correlation  of  the  array  output  x{n)  and  its 
frequency-shifted  version  u{n)  =  x{n  -|-  (or 

i£(n)  =  05* (n  -b  no)e^^^"”)  at  a  particular  cyclic  fre¬ 
quency  a  of  the  desired  signals  so  that  the  interference 
and  noise  which  do  not  exhibit  the  same  cyclic  fre¬ 
quency  can  be  eliminated.  It  has  been  proved  [4]  that 
the  weighting  vectors  of  CAB  corresponding  to  indi¬ 
vidual  desired  signals  are  the  left  singular  vectors  of 
the  cyclic  (conjugate)  correlation  matrix  of  the  array 
output  i.e., 

Ku  =  WcabA.V^  (1) 

where  W cab  is  the  left  singular  vector  matrix  (each 
column  of  W cab  denotes  the  weighting  vector  of  each 
desired  signal),  is  the  singular  value  matrix  and  V 
is  the  right  singular  vector  matrix. 
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CAB  can  asymptotically  achieve  optimal  SINK  when 
there  is  a  single  desired  signal  and  the  weighting  vector 
wcAB  is  proportional  to  the  steering  vector  of  the  de¬ 
sired  signal.  However,  the  performance  of  CAB  deteri¬ 
orates  when  there  are  multiple  signals,  each  having  the 
same  cyclic  frequency.  This  is  not  surprising  because 
in  general  the  left  singular  vectors  are  not  proportional 
to  the  individual  signal  steering  vectors  unless  the  mul¬ 
tiple  signals  are  well-separated  in  the  sense  that  signals 
have  spatial  separations  of  more  than  one  beamwidth 
and/or  are  of  very  uneven  power.  Therefore,  CAB  in¬ 
tends  to  work  in  the  scenario  of  single  desired  user, 
i.e.,  co-channel  users  all  have  different  cyclic  frequen¬ 
cies.  This  results  in  extra  bandwidth  consumption. 

In  this  section,  an  improved  CAB  algorithm  is  pro¬ 
posed  which  exploits  the  fact  that  the  left  singular  vec¬ 
tors  of  and  the  signal  steering  vectors  span  the 
same  subspace  (signal  subspace)  so  that  a  singular  vec¬ 
tor  rotation  technique  is  used  to  iteratively  estimate 
the  steering  vectors  assuming  that  the  signals  are  sta¬ 
tistically  independent  of  each  other.  The  steering  vec¬ 
tor  model  includes  both  the  individual  elemental  errors 
and  the  spatial  properties  of  the  signal,  thus  the  pro¬ 
cedure  is  valid  for  uncalibrated  array.  The  improved 
CAB  algorithm  allows  multiple  desired  users,  or  co¬ 
channel  users  operating  at  same  frequency  to  achieve 
bandwidth  saving. 

It  is  well  known  that  the  matrix  can  also  be 
rewritten  as  its  steering  vector  decomposition,  i.e., 

=  DR^D^  (2) 

where  D  is  the  matrix  of  signal  steering  vectors,  Jl®  is 
the  cyclic  (conjugate)  correlation  matrix  of  the  signals. 
Therefore,  we  can  see  that 

Column  space  of  W cab  =  Column  space  of  D  (3) 

It  has  been  proved  [5]  that  there  exists  a  unitary  matrix 
Q  such  that 

=WcabA;Q  (4) 

Since  D,  R°  and  Q  are  unknown,  then  Eq.(4)  does  not 
have  a  unique  solution,  however,  a  priori  information 
about  the  structure  of  a  steering  vector  can  be  used  to 
iteratively  find  the  matrices  D,  ilf  and  Q  that  satisfy 
the  above  equality. 

The  detailed  procedure  of  iteratively  solving  Eq.(4) 
can  be  found  in  [6,7]. 

3.  OPTIMAL-CONSTRAINED  LMS 
WEIGHTING  VECTORS 

Once  D,  i.e.,  the  steering  vectors  of  co-channel  users 
are  resolved,  the  LCMV  beamforming  algorithm  can 


be  used  to  suppress  interference.  The  principle  of  the 
LCMV  beamforming  is  to  constrain  the  beamformer  so 
that  signals  from  the  directions  of  interested  are  passed 
with  specified  gain  and  phase.  The  weighting  vector 
Wk  is  chosen  to  minimize  the  output  variance  (power) 
subject  to  the  response  constraints,  i.e., 

min  wiRxWk  (5) 

to*  * 

s.t.  D^wk  =  g  (6) 

where  Re  =  E{a!(n)xl(n)}  is  the  correlation  matrix  of 
the  antenna  array  output  and  g  is  the  response  vector 
of  the  form  such  as 

ff  =  [0  •••0  10  •••0]^  (7) 

where  “1”  in  g  occurs  at  the  kth  position  for  the  re¬ 
sponse  to  the  ibth  desired  user  and  “0”  are  the  response 
to  the  interferences,  and  ^  denotes  transpose. 

The  optimal  weighting  is  given  as,  by  solving  the 
minimization  in  Eqs.(5)  and  (6), 

w,  opt  =  R-^D[D^R-^D]-^g  (8) 

Since  the  correlation  matrix  Rx  is  unknown  a  priori^ 
it  has  to  be  learned  by  an  adaptive  technique.  In  con¬ 
strained  gradient-descent  optimization,  the  weighting 
vector  is  initialized  at  a  vector  satisfying  the  constraint 
in  Eq.(6),  and  at  each  iteration  the  weighting  vector  is 
moved  in  the  negative  direction  of  the  constrained  gra¬ 
dient.  Thus,  the  adaptation  can  be  done  as 


(9) 

=  D(D^D)-^g 

(10) 

where  is  a  scalar  to  control  the  step  size  of  the 
tive  process  and  is  usually  chosen  as 

adap- 

0  <  /LI  ^  1/^max 

(11) 

with  Ama®  being  the  maximum  eigenvalue  of  the  cor¬ 
relation  matrix  R*.  Ra,(n)  in  Eq.(9)  denotes  an  esti¬ 
mation  for  Rx  at  the  nth  iteration.  An  available  and 
simple  approximation  for  R»  at  the  nth  iteration  is  the 
outer-product  of  array  output  *(n)x^(n).  Substitution 

of  this  estimation  into  Eq.(9)  gives 

(12) 

-/ix(n)xHn)w^"^]  + 

=  DiD^D)-^g 

(13) 

An  alternative  estimation  of  .^(n)  is  given  by 

1  ^ 

^(n  -f  AT)  =  —  5^  x(n  -f  i)xt(n  +  i) 

(14) 

i=rl 
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Substituting  Eq.(14)  into  Eq.(9)  yields  an  adaptation 
of  weighting  vectors  using  block  data  (block  length  is 
N),  i.e., 

^(n+N)  ^  (15) 

a;(n  +  i>^(n  +  +  w^°'> 

*  =  1 

=  D{D^D)-^g  (16) 

In  this  way,  computational  load  can  be  reduced  and 
possibly  better  performance  can  be  achieved  [9]. 

We  notice  that  the  estimated  signal  steering  vectors 
provide  an  initial  weighting.  The  detailed  derivation 
and  the  convergence  of  the  adaptive  procedure  can  be 
referred  to  [2]. 


4.  COMPUTER  SIMULATIONS 

The  performance  of  the  improved  CAB  algorithm  is 
demonstrated  by  two  computer  simulations  using  a  7- 
element  uniform  linear  arary  with  half- wavelength  spac¬ 
ing.  Simulated  data  are  generated  incorporating  array 
calibration  errors  where  calibration  phase  error  are  uni¬ 
formly  distributed  over  ±7r/8  and  gain  error  are  uni¬ 
formly  distributed  over  [0.8, 1.2].  White  Gaussian  noise 
at  each  array  element  is  added. 

In  the  first  example,  three  co- channel  users  of  BPSK 
signals  with  identical  normalized  data  rate  0.5,  normal¬ 
ized  frequency  offset  0.2  and  roll-off  factor  0.5  incident 
upon  the  array  from  DOA  of  15®,  0®  and  -30®  with 
respect  to  (w.r.t.)  the  normal  of  the  array.  The  rel¬ 
ative  power  are  0  dB,  0  dB  and  10  dB  respectively. 
Low  SiViZ  =  —7  dB  is  chosen  to  illustrate  the  perfor¬ 
mance  in  the  presence  of  weak  desired  signals.  Fig.  1(a) 
shows  the  beam  pattern  resulting  from  Eq.(8)  using 
the  steering  vectors  estimated  by  Eq.(4)  with  the  sig¬ 
nal  from  15®  being  considered  as  the  desired  signal. 
We  observe  that  two  deep  nulls  are  placed  at  -30®  and 
0®  to  suppress  the  interferences,  and  0  dB  gain  at  15® 
(the  response  of  desired  signal  is  chosen  as  1).  For  com¬ 
parison,  the  beam  pattern  resulting  from  Eq.(8)  using 
the  weighting  vectors  of  CAB,  Wcab,  is  plotted  in 
Fig.  1(b),  it  is  apparent  that  the  suppression  of  the  sig¬ 
nal  from  —30®,  which  is  far  apart  from  the  other  two 
signals,  is  adequate  resulting  a  deep  null  at  -30®  while 
the  other  two  weighting  vectors  do  not  correspond  to 
the  steering  vectors  of  the  signal  from  15®  and  0®,  and 
result  no  null  at  0®. 


from  15®  using  the  estimated  steering  vectors 


Fig.l(b)  Beam  pattern  w.r.t.  the  signal 
from  15®  using  W cab 


In  the  second  example,  three  co-channel  users  of 
BPSK  signals  incident  upon  the  array  from  15®,  0® 
and  —10®  w.r.t.  the  normal  of  the  array  with  identi¬ 
cal  normalized  data  rate  0.5  and  roll-off  factor  0.5  but 
different  normalized  frequency  offset  0.2,  0.2  and  0.3 
respectively.  The  signal  powers  are  again  0  0  dB 

and  10  dB  respectively.  The  signals  from  15®  and  0® 
are  considered  as  the  desired  signals.  SNR  =  0  dB, 
In  the  experiment,  the  estimated  steering  vectors  of 
the  two  desired  signals  are  obtained  by  iteratively  solv- 
ing  Eq.(4).  Then  the  block  data  adaptation  given  by 
Eqs.(15)  and  (16)  with  block  length  being  equal  to  5 
samples  is  employed.  The  output  SINK  are  plotted  in 
Fig.2.  We  observe  that  the  output  SINK  of  the  signals 
converge  after  600  samples. 
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signal  from  15  (deg.) 


Fig.2  Output  SINR  of  the  two  desired  users 
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5.  CONCLUSION 

The  improved  CAB  algorithm  provides  a  solution  of 
user  allocation  in  an  agile  beam  system  to  achieve  more 
efficient  frequency  re-use  by  improving  the  system  per¬ 
formance  for  multiple  users  working  at  the  same  fre¬ 
quency. 
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Abstract 

In  this  paper,  time-frequency  distributions 
(TFD)  are  applied  for  interference  excision  in  spread 
spectrum  communication  systems.  The  focus  is  on 
jammers  consisting  of  pulses  of  constant  envelop 
frequency  modulated  interference.  The  time-support 
and  the  instantaneous  frequency  (IF)  information 
provided  by  the  TFD  are  used  to  reduce  the  jammer 
effect  on  the  receiver  performance.  This  is  achieved  by 
applying  an  excision  notch  filter  with  a  null  placed  at 
the  interference  IF.  The  filter  is  turned  on  and  off  in 
synchronous  with  the  interference  duty  cycle.  The  bit 
error  rates  at  different  frequencies  are  given  and 
compared  with  those  obtained  using  the 
multiresolution  analyses. 

L  Introduction 

Direct  Sequence  Spread  Spectrum  (DSSS) 
systems  are  widely  used  in  communications  in  a  variety 
of  applications  including  suppression  of  a  strong 
interfering  signal  due  to  jamming  or  multipath 
propagation  and  providing  multiple  simultaneous  use  of 
the  same  spectrum.  These  systems,  however,  are  not 
jammer  proof.  In  order  to  increase  their  jammer 
resistance,  many  existing  DSSS  systems  are  augmented 
with  other  forms  of  signal  processing,  which  act  on 
improving  receiver  characteristics  and  increasing  the 
overall  jammer  resistance  [1,2].  Linear  excision  filters 
are  often  used  to  mitigate  interference.  The  filter 
coefficients  can  be  generated  using  various  estimation 
methods,  including  block  high  resolution  and  adaptive 
least  mean  squares  techniques.  Most  of  the  existing 
interference  excision  algorithms,  however,  assume  a 
stationary  environment,  or  jammers  with  slowly- 
varying  spectral  characteristics.  As  such,  receiver 
performance  becomes  unsatisfactory  under  highly 
nonstationary  conditions  and  rapidly  changing  jamming 
environment.  It  is  therefore  desirable  to  devise  excision 


methods  which  are  based  on  jammer  characteristics  in  the 
time-frequency  domain,  where  the  nonstationary 
characteristics  of  the  jammer  are  revealed  and  accurate 
information  on  its  power  localization  in  both  time  and 
frequency  is  provided.  In  turn,  one  may  be  able  to 
remove  the  nonstationary  jammer  with  minimum 
distortion  of  the  desired  signal. 

Two  time-frequency  based  interference  excision 
techniques  have  been  recently  proposed  for  improved 
receiver  performance  under  nontraditional  jammers.  In  the 
first  approach,  interference  excision  is  achieved  using 
time-frequency  distributions.  This  approach  was 
introduced  by  Amin  [3]  and  detailed  in  [4,5,6].  In  this 
case,  the  interference  instantaneous  frequency,  obtained 
using  appropriate  time-frequency  distributions,  is  used  to 
form  a  time-varying  linear  phase  excision  filter.  This 
filter  has  a  notch  which  is  in  tune  with  the  jammer  IF. 
The  second  approach  is  based  on  multiresolution 
analysis[7],  where  the  energy  localization  properties  of 
the  wavelet  transform  are  employed  to  overcome  the 
windowing  effects  associated  with  the  short-time  Fourier 
transform.  For  jammer  excision,  the  wavelet  transform 
is  applied  to  the  data  and  the  coefficients  of  highest 
values,  representing  the  jammer  energy,  are  then 
removed.  From  the  nature  of  these  two  techniques,  it  is 
clear  that  while  the  time-frequency  distribution  excision 
methods  are  most  efficient  for  constant  envelop  frequency 
modulated  signals,  where  the  jammer  energy  is 
concenterated  around  its  IF,  the  wavelet  transform  is 
primarily  effective  when  the  jammer  energy  is  captured 
in  one  or  few  of  the  transform  bins.  The  later  requires  the 
wavelet  tiling  of  the  time-frequency  plane  to  be  in  close 
match  with  the  jammer  characteristics. 

In  this  paper,  the  performance  of  the  above  two 
techniques  under  pulse  jamming  is  investigated.  The 
jammer  is  a  train  of  sinusoidal  or  chirp  pulses  with  fixed 
duty  cycles.  The  time-frequency  distribution  using 
several  kernels  including  Wigner,  Choi-Willimas,  the 
Cone  shape,  and  others  offer  the  means  to  detect  the 
beginning  and  the  end  of  each  pulse  [8].  Additionally, 
these  kernels  yield  a  good  estimate  of  the  jammer 
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instantaneous  frequency  during  the  pulse  period.  As  such, 
the  excision  filter  can  be  designed  with  an  appropriate 
notch  and  can  be  turned  on  and  off  according  to  the  duty 
cycle  of  the  jammer. 

In  Section  2,  a  brief  review  of  TFD  is  presented 
with  discussion  on  the  interference  excision  systems 
based  on  the  instantaneous  frequency  estimate.  The 
wavelet  transform,  as  it  is  applied  to  the  underlying 
problem  is  discussed  in  Section  3,  and  Section  4 
presents  the  results  of  the  bit  error  rate  simulations 
where  the  TFD-based  excision  and  the  wavelet  transform 
excision  techniques  are  compared. 

2.  TFD  Interference  Excision  Systems 

Time-frequency  distributions  (TFD)  are  uniquely 
characterized  by  a  two  dimensional  function,  which  is 
referred  to  as  a  "kemer'.  The  t-f  kernel  can  be  designed 
such  that  the  corresponding  TFD  satisfies  several  desired 
properties.  For  a  frill  discussion  of  the  time-frequency 
distributions  and  kernel  design  methods,  the  reada:  is 
referred  to  reference  [8].  Among  the  desired  t-f  properties 
is  the  capability  to  satisfy  the  instantaneous  frequency 
condition.  Generally,  this  property  allows  the  TFD  to 
encounter  peaks  at  the  derivative  of  the  phase  of  each 
signal  component,  irrespective  of  their  time-varying 
nature. 

The  time-frequency  distribution  of  the 
signal  f(t)  is  defined  as 

Cf(U(o;q>)=  f  \<p(t-u,T)f(u  +  T/2)f*(u-r/2)e^^^dudr 

—  (1) 

where  ”t"  is  the  time  index  and  ’T’  is  the  frequency  index. 

The  t-f  kernel  0ft,  Tj  is  a  function  of  the  time  and  lag 
variables.  The  well  known  Wigner  distribution  is  a 

special  case  of  (1)  with  0fr,  r)  =  S(t).  A  closer  look  at 
equation  (1)  reveals  the  simple  fact  that  the  TFD  is  the 
Fourier  transform  (FT)  of  an  estimated  autocorrelation 
function.  However,  contrary  to  the  common  way  of 
performing  time-averaging,  the  dependency  of  0ft,  r)  on 
T  allows  the  autocorrelation  function  estimation  to  be 
different  for  different  lags. 

In  addition  to  the  instantaneous  frequency,  there 
are  other  common  desired  properties  which  qualify  a  TFD 
for  proper  representations  of  signals  in  time  and 
frequency.  These  properties  include  the  time  support  and 
frequency  support.  Both  properties  are  important  for  the 
cases  of  excision  of  pulsed  and  bandlimited  jammers, 
since  they,  respectively,  allow  the  TFD  to  be  zero 
(shows  no  power)  at  all  time  instants  and  frequency  bins 
where  the  signal  is  not  present.  The  TFD  should  also 
satisfy  the  marginals  properties  in  which  the  distribution 
of  signal  power  over  only  the  time  variable  or  the 


frequency  variable  can  be  separately  obtained  from  the 
joint  TFD.  Output 


Fig.l  TFD  Excision  System 

The  interference  excision  system  based  on  the 
TFD  is  shown  in  Fig.l.  The  IF  is  estimated  using  t-f 
kernels  with  desirable  properties.  Most  importantly,  the 
IF  and  time-support  conditions  must  be  satisfied.  The  IF 
is  used  to  define  a  notch  of  a  three  coefficient  zero-phase 
filter.  This  filter  is  applied  to  both  the  input  data  and  the 
PN  at  the  receiver.  The  output  of  both  filters  are  then 
correlated  and  a  decision  rule  is  applied  .  In  the 
simulation  section,  the  Choi-Williams  kernel  is  used  for 
IF  estimation. 

3.  Wavelet  Domain  Excision 

Much  research  has  been  accomplished  applying 
wavelet  and  multirate  methods  to  communications  [  9.J 
and  in  particular,  to  the  interference  excision  problem 
[7].  For  this  study,  a  standard  discrete  wavelet  transform 
(DWT)  is  performed  on  the  received  spread  spectrum 
binary  phase  shift  keyed  (SS-BPSK)  signal  (rectangular 
pulse  shaping)  and  the  resulting  coefficients,  representing 
the  signal  in  the  wavelet  basis,  are  modified  via  an 
excision  rule  that  zeroes  out  the  highest  10%  of  the 
transform  coefficients.  The  reader  will  take  caution  that 
this  is  only  one  of  many  excision  rules  available,  and  is 
not  necessarily  optimal  for  this  application.  It  is  an 
intuitively  appealing  rule  in  the  sense  that  the  DWT 
decomposes  signals  into  dyadic  subbands  which  localize 
narrowband  interference.  Assuming  the  jammer  to  signal 
energy  ratio  (JSR)  is  sufficiently  high,  this  localization 
causes  the  coefficients  in  the  ^uency  bin  where  the 
narrowband  interference  lies  to  be  significantly  greater 
than  the  rest  of  the  transform  coefficients.  Except  for 
high  frequency  interferers,  excising  the  highest  ten 
percent  of  the  coefficicients  is  sufficient  to  remove  the 
noise.  Unfortunately,  a  significant  portion  of  data  is  lost 
as  well. 

The  communication  system  simulated  in  this 
work  consisted  of  a  BPSK  signal  with  a  pseudo-noise 
(PN)  spreading  code  applied  at  the  transmitter,  additive 
white  gaussian  noise  (AWGN,)  and  constant  frequency 
and  frequency  modulated  continuous  wave  (CW)  jammers 
with  energies.  At  the  receiver,  the  DWT  output  was  sent 
to  the  excision  block  followed  by  despreading  of  the 
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wideband  signal  and  a  correlator.  A  signum  based 
decision  rule  provided  the  data  estimates.  Spreading  codes 
of  128  and  32  chips  per  bit  were  employed  10%  jammer 
duty  cycle.  Typically  128  bits  were  simulated  at  a  time 
and  an  11-  or  13-level  DWT  (12  or  14  dyadic  fiequency 
slots)  was  performed  on  the  entire  spread  sequence.  At 
128  chips  per  bit,  this  means  a  transform  length  of 
128^  =  2^^,  hence  the  choice  for  number  of  levels  in  the 
DWT.  The  sampling  rate  for  the  system  was  chosen  to 
be  1  sample  per  chip  -  effectively  limiting  the  highest 
frequency  jammer  to  half  the  chip  rate.  However,  the 
first-null  bandwidth  of  SS-BPSK  is  the  chip  rate,  and 
hence  only  jammers  in  the  lower  half  of  the  spectrum  are 
considered.  This  is  not  a  crucial  issue,  since  TFD 
methods  are  not  frequency  dependent  and  so  they  are 
unaffected  by  this  limitation,  and  the  tiling  of  the  DWT 
is  such  that  signals  with  frequencies  in  the  upper  half  of 
the  spectrum  only  worsen  the  performance. 

4.  Simulations 

Fig.(2-a)  compares  the  bit  error  rates  in  the  case 
of  pulsed  sinusoid  with  (1/7.1)  normalized  frequency 
using  128  chips/bit  for  the  TFD  and  DWT  excision 
methods.  In  addition,  the  BER  corresponding  to  no 
preprocessing  is  also  shown.  For  the  TFD  method,  we 
have  included  the  BERs  with  exact  IF  as  well  as 
estimates  of  the  IF  using  equation  (1)  with  128  and  8  bin 
FFT.  It  is  perfectly  clear  that  all  TFD  BER  curves  are 
significantly  better  than  the  DWT  method.  The  128-bin 
FFT  outperforms  the  8-bin  FFT,  due  to  bias  caused  in 
the  IF  estimate  using  fewer  frequency  bins.  It  is 
noteworthy  that  exact  IF  provides  no  errors  up  to  80  cB 
Jammer-to-signal  ratio. 

Fig.(2-b)  shows  the  same  set  of  curves  as 
Fig.(2-a),  except  we  now  use  32  chips/bit.  The  relative 
behavior  of  the  two  time-frequency  excision  methods 
remain  approximately  the  same.  Overall,  the  reduction  in 
the  gain  leads  to  an  increase  in  the  bit  error  across  the 
JSR.  The  experiments  conducted  for  Fig.l  were  repeated 
using  higher  frequency  (1/2.3).  The  corresponding 
BER  curves  are  shown  in  Fig.(2-a,b).  The  superiority  of 
the  TFD  methods  remain  invariant.  Fig.3  shows  the 
BER  curves  for  the  case  of  a  pulsed  chirp  jammer,  using 
128  and  32  chips/bit.  The  performance  of  the  TFD  is 
slightly  deteriorated  from  the  case  of  fixed  sinusoid. 
Still,  the  TFD  has  a  remarkable  performance  which 
drastically  improves  over  the  DWT  performance. 

5.  Conclusions 

The  interference  excision  system  based  on  time 
frequency  distributions  shown  in  Fig.  I,  outperforms  the 
wavelet  transform  excision  method  for  constant  envelope 
pulsed  interference  of  either  constant  or  modulated 
frequency.  Using  exact  IF  information  yields  better 


results  than  IF  estimates,  but  this  is  to  be  expected.  In 
defense  of  multiresolution  methods,  however,  it  should 
be  noted  that  the  jammer  types  considered  here  are  not 
conducive  to  MRA  decompositions,  and  as  a  matter  of 
fact,  a  regular  FFT  outperforms  the  wavelet  in  this 
scenario,  especially  for  constant  frequency  jammers. 
Pulsed  interference  without  IF  information  (bursts  of 
uncorrelated  energy)  were  not  considered  in  this  study, 
but  it  is  suggested  that  the  TFD  methods  would  not 
perform  as  well  in  this  case,  and  the  performance  of  the 
wavelet  excision  scheme  would  improve. 
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Abstract 

This  paper  describes  a  non-linear  adaptive  equalizer 
based  on  a  sub-optimal  HHM formulation  leading  to  a  small 
computational  complexity.  A  similar  approach  was  already 
proposed  in  the  monochannel  case  in  [4],  and  we  show 
here  that,  in  a  multichannel  context,  large  improvements 
are  obtainable.  1 1  is  well  known  that  Maximum  Likelihood 
methods  are  subject  to  local  minima  problems.  Although  of 
reduced  importance  in  our  previous  approach  (due  to  the 
on-line  adaptation),  the  problem  was  still  present.  Since  it 
is  now  well  known  that  in  the  multichannel  case,  the  blind 
equalization  problem  has  a  unique  minimum,  one  can  hope 
that  the  local  minima  problems  can  be  solved  in  this  context. 
However,  a  straightforward  formulation  of  the  previous  al¬ 
gorithm  in  the  multichannel  case  does  not  make  it.  Hence, 
we  propose  a  new  algorithm  allowing  Conditional  Means 
estimates  of  the  emitted  symbols  and  blind  identification 
of  each  impulse  response  of  the  channels,  involving  alto¬ 
gether  a  maximum  likelihood  formulation  (by  means  of  an 
approximated  EM  algorithm)  and  a  criterion  making  use 
of  the  spatial  diversity  of  the  multichannel  system.  Simula¬ 
tions  are  provided,  showing  the  identification  of  the  impulse 
responses  of  the  various  channels,  as  well  as  the  symbol  es¬ 
timation  performances  in  terms  of  Bit  Error  Rate  (BER ).  The 
improvements  over  the  single  channel  case  are  highlighted. 


1.  Introduction 

We  consider  here  reception  through  multiple  sensors.  In 
this  case,  the  various  sensors  receive  different  continuous¬ 
time  waveforms  due  to  the  different  physical  channels  that 
separate  them  from  the  transmitter.  However,  after  sam¬ 


pling  at  the  symbol  rate,  the  corresponding  received  discrete 
sequence  can  be  modeled  as  the  output  of  a  Finite  Impulse 
Response  (FIR)  Filter. 

In  the  recent  years,  following  the  work  by  Tong,  Xu,  and 
Kailath  [5]  many  methods  have  been  proposed  in  order  to 
equalize  such  systems,  relying  on  the  fact  that  the  received 
signals  have  a  rank-deficient  correlation  matrix  [1].  Initially, 
these  methods  were  proposed  in  block  versions.  This  block 
formalism  does  not  allow  a  tracking  of  the  channels  (when 
they  are  time-varying)  and  has  the  drawback  that  the  corre¬ 
sponding  arithmetic  complexity  is  required  by  bursts,  a  fact 
which  either  requires  a  large  hardware  or  introduces  a  large 
delay.  Very  few  methods  allow  an  on-line  processing  of  the 
data  as  they  come.  Furthermore,  they  usually  do  not  explic¬ 
itly  take  into  account  the  effect  of  noise.  Finally,  a  common 
feature  of  these  methods  is  that  they  rely  on  structural  prop¬ 
erties  of  the  channel,  meaning  that  they  do  not  use  any  a 
priori  knowledge  about  the  input,  which  is  often  available  at 
no  cost  in  a  communication  situation.  For  instance,  it  can  be 
useful  to  take  advantage  of  the  fact  that  the  emitted  symbols 
are  taken  from  a  discrete  finite  alphabet. 

The  algorithm  derived  in  this  paper  is  an  adaptive  one, 
providing  at  each  step  an  estimate  of  the  impulse  responses 
of  the  multiple  channels  (thanks  to  the  combination  of  two 
criteria  which  results  in  a  good  tradeoff  between  residual 
error  and  sensitivity  to  initialization),  as  well  as  a  Condi¬ 
tional  Mean  (CM)  estimate  of  the  symbols  currently  stored 
in  the  channel  memory;  The  Hidden  Markov  Model  (HMM) 
is  used  here  in  a  sub-optimal  way  so  that  it  does  not  involve 
the  high  computational  complexity  which  is  the  issue  of  such 
an  approach.  Note  that  the  model  formulation  is  usable  at 
a  reasonable  cost  only  by  the  use  of  the  a  priori  knowledge 
that  the  emitted  sequence  belongs  to  a  finite  alphabet.  Note 
that  all  computations  provided  in  this  paper  are  given  in  real 
variables.  The  extension  to  complex  ones  is  straightforward. 
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2.  Problem  formulation 

Notations  are  as  follows: 

denotes  the  transposition  of  vector  V 

-  1)]^  denotes  the  impulse  re¬ 
sponse  of  the  channel  from  the  transmitter  to  the  n*'‘  an¬ 
tenna.  (L  being  the  channel  memory  length  of  all  impulse 
responses). 

-  is  the  additive  noise  on  the  antenna.  We 
assume  these  additive  noises  to  be  mutually  uncorrelated. 
N  being  the  number  of  sensors,  Bt,  denotes  the  following 
vector: 

Bt  = 

-  The  transmitted  sequence  x  is  independent  and  iden¬ 
tically  distributed  (iid),  and  can  take  M  different  values 
Qkyk  =  I...M  depending  on  the  modulation.  Xt  is  the  vec¬ 
tor  containing  all  symbols  stored  in  the  channel  memory  at 
time  t: 

=  [x{i)x{i  —  —  L  +  1)]^ 

Then,  the  signal  received  at  time  i  on  the  antenna  is 
given  by: 

^  e^^\i)x{t  -  i)  +  b^’^\t)  (1) 

i=0 

Thus,  we  have  the  stationary  model: 

Yt  =  e^Xt  +  Bt  (2) 

with  matrix  0  and  vector  Y  <  defined  as: 

0  =  (3) 

y.  =  . (4) 

The  model  described  by  (2)  defines  a  Hidden  Markov  Model 
in  which  Xt  is  the  state  vector  of  a  Markov  process  described 
by  the  following  state  equation  (T  being  a  shift  matrix): 

Xt+i  =  TXt  +  x(t  -Hi)  *  [10...0]2’ 

This  hidden  Markov  process  is  only  reachable  through 
the  observation  equation  (2)  which  is  identical  to  that  corre¬ 
sponding  to  a  transmission  through  a  single  channel.  How¬ 
ever,  the  hidden  process  can  be  reached  through  N  different 
observations,  which  is  the  explanation  for  the  improved  per¬ 
formances  of  the  multichannel  algorithm  in  terms  of  BER. 

3  Conditional  Mean  (CM)  estimate  of  the 
emitted  sequence 

Suppose  that  current  estimates  of  the  channels  it  ,  n  = 
1,  •  ■  ■  and  of  the  state  vector  are  available  at 


time  t.  The  state  probabilities  corresponding  to  the  Forward 
recursion  [2]  (ie  Pr(X^  =  ...Yj)) 

would  be  very  computationally  demanding,  even  for  moder¬ 
ate  length  channels  since  it  requires  the  calculation  of 
probabilities  at  each  step.  We  use  here  the  approximation 
derived  in  [4]  which  allows  the  amount  of  such  calculations 
to  be  only  M  *  in  this  approach,  instead  of  computing  the 

joint  probability  of  all  components  of  vector  Xt ,  we  evaluate 
the  probability  of  each  component  separately,  conditioned 
by  the  current  prediction  of  the  other  ones.  This  prediction 
is  obtained  by  taking  advantage  of  the  shift  structure  of  the 
process  X.  ^tu-i  prediction  of  the  i*'*  compo¬ 
nent  of  vector  ,  knowing  the  observations  up  to  time  <  - 1 , 

we  have  X^^tL^  =  Let  be  the  probability 

that  the  j"*  symbol  in  the  channel  memory  be  equal  to  qk, 
knowing  the  observations  up  to  time  t,  the  current  estimate 
of  the  channels  parameters  (©«),  and  the  prediction  of  all  the 
other  symbols  stored  in  the  channel  memory.  Then,  thanks 
to  the  so-called  forward  recursion,  we  can  write:  and: 

«({)(<;)  =  Pr((xi^^  -  q,)\Yu..,Yt,@t,xlll_t,l¥^  j) 

=  (5) 

Where  the  N  dimensional  Gaussian  distribution, and 

Xt\t--i(j,  Qk)  is  the  vector  Xt\t-i  where  its  component 
is  replaced  by  possible  choice  qk  in  the  alphabet : 

Xtit-rik)  = 

,  then,  the  estimate  of  vector  Xi  is  given  by: 

Xi\t  —  J 

Where  X^^^  is  the  Conditional  Mean  Estimate  of  xt-f. 

M 

k=l 

The  estimate  of  the  emitted  sequence  being  performed, 
we  now  focus  on  the  update  of  the  multi-channel  impulse 
response. 

4.  Estimation  of  the  channel  parameters  by  a 
combination  of  two  criteria 

The  parameters  are  estimated  by  minimizing  a  criterion 
which  is  a  linear  combination  of  a  criterion  based  on  the 
spatial  diversity  of  the  system  Cf(0)  on  one  side,  and  of 
L((0),  the  expected  log  likelihood,  on  another  side.  Both 
criteria  are  evaluated  with  an  exponential  forgetting  factor 
A,  as  we  wish  tracking  slow  variations  of  the  channels: 

g^(0)  =  -  a)Li(@i,&)  aCiie)]  (6) 

1=1 
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4.1.  Calculation  of  the  expected  log-Iikelihood 

The  criterion  we  deal  with  in  this  section  is  the  so-called 
Kullback-Leibler  function  of  the  Expectation-Maximisation 
(EM)  algorithm,  defined  as  the  expectation  of  the  logarithm 
of  the  likelihood  function  for  the  complete  data  calculated 
at  time  t(see  [3]  for  the  terminology): 

(©, ,  0)  =  E{logJ\f{Yt ,  ;  ©)  I  Yi , . . ,  y*;  ©0 

m‘- 

L<(©„©)  = 

(=1 

M'- 

=  (7) 

/=! 


Only  simulations  support  this  claim  at  that  time.  Physically, 
the  criterion  relies  on  the  fact  that  the  output  of  each  antenna 
corresponds  to  the  filtering  of  the  SAME  input  vector  by  dif¬ 
ferent  filters.  Consider  2  antennas  among  the  N  ones:  If 
we  filter  one  received  signal  by  the  impulse  response  of  the 
other  antenna  and  vice  versa,  both  outputs  will  be  filtered 
by  the  same  coefficients,  hence  should  be  equal  (up  to  the 
disturbances  introduced  by  the  noise).  This  criterion  can  be 
written  in  many  different  ways.  We  have  chosen  : 

(;((©)  =  ^  _  y^(«)0(m)|2  ^9^ 

n^m 

where  y/"^  =  -  L  -t- 1)]  Note  that  other 

criteria  share  the  same  property  and  could  be  used  in  con¬ 
junction  with  the  approximate  log-likelihood. 


where  is  one  of  the  possible  realizations  of  vector  Xt 
andr((/)  is  the  conditional  probability  of  the  state:  Ftil)  - 
PriXt=^,\Y:,..,Yu&t) 

Because  of  the  approximation  developped  in  the  previous 
section,  consisting  in  computing  conditional  probabilities 
instead  of  joint  probabilities  ,  we  have  to  deal  with  to  the 
so-called  "pseudo-likelihood":  basically,  we  approximate 
each  rt(/)  which  is  defined  as  the  joint  probability  on  every 
component  of  Xt,  by  the  product  of  the  conditional  proba¬ 
bilities  of  each  component,  given  the  prediction  of  the  other 
ones: 


4.3  Maximization  step 

The  maximization  over  each  channel  6^’^  is  performed 
by  computing  the  partial  derivatives  of  (3«(©)  according  to 
0^'^ .  Finally,  the  channels  estimates  are  obtained  recursively 
by: 


a(") 

^t+i 


(")  +  (1  _  -  ft’^^^Xtit)X 


+al^ 


(n)-l 


)X 


t\t 

(m)T 


in^n 


L-l 


p(xT  =  [9io.9.-..-.9uJ|yi...,yt,0a  =  n 


The  expansion  of  this  calculation  (for  high  SNR  levels) 
leads  to  the  following  expression  of  the  expected  pseudo- 
likelihood  at  time  t : 


N 


(8) 


/=! 


Rt  and  are  defined  as  follows: 

Rt  =  XRt-i-bXtXf 

4"’  =  A4"i -Hy/”7^y/"^  (10) 

Their  inverses  can  be  computed  recursively  using  the  inverse 
matrix  lemma  ([4]). 

5.  Experimental  results 


4.2  Criterion  based  on  the  spatial  diversity  of  the 
system 

We  use  the  criterion  described  in  [5].  Under  the  assump¬ 
tion  that  the  impulse  responses  of  the  various  channels  have 
no  common  zeros,  it  can  be  shown  [5]  that  this  criterion  has 
a  single  solution.  This  property  is  useful  in  our  case  since 
the  EM  algorithm  is  known  to  have  local  minima.  Moreover, 
even  if  the  log  likelihood  involved  in  this  multi-channel  case 
takes  explicitly  into  account  the  effects  of  the  noise,  it  does 
not  take  full  advantage  of  the  spatial  diversity  of  the  system. 
It  is  expected  that  a  suitable  weighting  of  both  criteria  can 
solve  the  local  minima  problem,  while  maintaining  the  ro¬ 
bustness  towards  noise  close  to  that  of  the  EM  algorithm. 


5.1.  Adaptive  behavior 

First  consider  the  adaptive  behavior  of  the  algorithm,  on 
a  BPSK  modulation,  with  A  =  2,  on  non-minimum  phase 
channels,  =  [0.150.9  0.3]  and  =  [0.30.3  0.3], 

Usefulness  of  the  criterion  based  on  diversity  Fig.l 
shows  the  evolution  of  the  taps  of  the  first  channel 
using  a  straightforward  extension  of  the  EM  algorithm  used 
in  [4]  (no  spatial  diversity  explicitly  taken  into  account).  It 
is  seen  that  the  algorithm  converges  to  a  local  minimum  of 
the  likelihood  corresponding  to  a  minimum-phase  channel 
;  ftO)  =  [0.9  0.35  0].  Fig2  corresponds  to  a  =  0.3,  and  the 
algorithm  converges  to  the  true  parameters. 
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Usefulness  of  the  MAP  estimate  The  usefulness  of 
Lt(H)  in  the  criterion  is  easily  seen  by  comparing  on  Fig3 
the  MSE  on  the  parameters  estimates  for  a  =  1  (Ct  only) 
and  a  =  0.3.  As  the  likelihood  takes  more  efficiently  the 
noise  effects  into  account,  the  parameters  produced  by  its 
minimization  are  more  accurate  than  the  one  obtained  when 
minimizing  Ci{H)  only.  Both  simulations  were  initialized 
to  the  same  values =  [1 00],  and  performed  with 
a  \0db  SNR  at  the  output  of  the  channels. 

5.2.  BER  results 

Fig4  compares  the  BER  obtained  by  using  two  channels 
(feO)  and  altogether)  to  that  obtained  using  a  single 
channel  (N  =  Ion  either  /lO)  or  /i^^^).  The  improvement 
is  significant,  while  the  computational  cost  involved  for  the 
multi-channel  case  is  still  linear  with  the  channel  memory. 

6.  Conclusion 

This  paper  proposes  a  new  algorithm,  which  couples 
two  different  and  complementary  blind  equalization  meth- 
ods.The  first  one  based  on  ML  identification  of  the  channels 
taps,  and  detection  of  the  symbols  thanks  to  a  HMM  for¬ 
mulation,  brings  robustness  towards  noise.  The  other  one 
is  based  on  a  criterion  involving  the  spatial  diversity  of  the 
system  and  tends  to  constrain  the  solution  to  be  unique.  The 
proposed  algorithm  is  shown  to  take  advantage  of  the  com¬ 
plementarity  of  both  criteria,  especially  avoiding  the  prob¬ 
lem  of  the  local  minima  of  the  likelihood,  while  providing 
accurate  results  in  case  of  poor  SNR.  Moreover,  the  fact  that 
the  algorithm  is  adaptive  allows  a  real  time  computation 
without  the  high  computational  complexity  of  the  HMM- 
based  classical  methods.  This  has  been  obtained  by  the  use 
of  a  suboptimal  HMM  formulation  which  has  nevertheless 
a  good  efficiency. 
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tion  Blind  Equalization  for  FIR  channel  input  Markov 
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Abstract 

We  present  in  this  paper  an  multiple  objective  optimiza¬ 
tion  approach  to  fast  blind  channel  equalization.  By  in¬ 
vestigating  first  the  performance  (mean-square  error)  of 
the  standard  fractionally  spaced  CMA  equalizer  in  the 
presence  of  noise,  we  show  that  CMA  local  minima  exist 
near  the  minimum  mean- square  error  (MMSE)  equaliz¬ 
ers.  Consequently,  CMA  may  converge  to  a  local  mini¬ 
mum  corresponding  to  a  poorly  designed  MMSE  receiver 
with  considerablely  large  mean-square  error.  Based  on 
the  multiple  objective  optimization  techniques,  we  pro¬ 
pose  next  a  blind  channel  estimator  by  exploiting  simul¬ 
taneously  the  second-order  cyclostationary  statistics  and 
the  constant  modulus  of  QAM- type  communication  sig¬ 
nals.  Such  a  channel  estimation-based  blind  equalization 
scheme  has  the  advantage  of  designing  FIR  minimum 
mean- square  error  equalizer  with  the  optimal  delay. 

1.  INTRODUCTION 

Blind  equalization  has  the  potential  to  improve  the  effi¬ 
ciency  of  communication  systems  by  eliminating  train¬ 
ing  signals.  Difficulties  of  its  application  in  wireless  com¬ 
munications,  however,  are  due  largely  to  the  character¬ 
istics  of  the  propagation  media  -  multipath  delays  and 
fast  fading.  The  challenge  is  achieving  blind  equaliza¬ 
tion  using  only  a  limited  amount  of  data. 

A  widely  tested  algorithm  is  the  constant  modulus  al¬ 
gorithm  (CMA)  [5,  10],  In  the  absence  of  noise,  under 
the  condition  of  the  channel  invertibility,  the  CMA  con¬ 
verges  globally  for  symbol-rate  UR  equalizers  and  frac¬ 
tionally  spaced  FIR  equalizers  [4,  6].  It  is  shown  in  [3] 
that  CMA  is  less  affected  by  the  ill-conditioning  of  the 
channel.  However,  Ding  et.  al.  [2]  showed  that  CMA 
may  converge  to  some  local  minimum  for  the  symbol- 
rate  FIR  equalizer.  In  the  presence  of  noise,  the  anzilysis 

This  work  was  supported  in  part  by  the  National  Science 
Foundation  under  Contract  NCR-9321813  and  by  the  Ad¬ 
vanced  Research  Projects  Agency  monitored  by  the  Federal 
Bureau  of  Investigation  imder  Contract  No.  J-FBI-94-221. 


of  convergence  of  CMA  is  difficult  and  little  conclusive 
results  are  available.  Another  drawback  of  CMA  is  that 
its  convergence  rate  may  not  be  sufficient  for  fast  fading 
channels. 

Another  approach  to  the  blind  equalization  is  based 
on  the  blind  channel  estimation.  Some  of  the  re¬ 
cent  eigenstructure-based  channel  estimations  (see  e.g. 
[7,  8])  require  a  relatively  smaller  data  size  compar¬ 
ing  with  higher-order  statistical  methods.  However  the 
asymptotic  performance  of  these  eigenstructure-based 
schemes  is  limited  by  the  condition  of  the  channel 
[12,  13].  Specifically,  the  asymptotic  normadized  mean- 
square  error  (ANMSE)  is  lower  bounded  by  the  con¬ 
dition  number  of  the  channel  matrix.  Unfortunately, 
frequency  selective  fading  channels  with  long  multipath 
delays  often  result  in  ill-conditioned  channel  matrices. 

The  key  idea  of  this  paper  is  to  combine  the  approach 
based  on  minimizing  the  constant  modulus  cost  and 
that  based  on  matching  the  second-order  cyclostation¬ 
ary  statistics.  The  main  feature  of  the  proposed  ap¬ 
proach  is  the  improved  convergence  property  over  the 
standard  CMA  equalization  and  the  improved  robust¬ 
ness  for  ill-conditioned  channels. 

2.  THE  MODEL 

Fractionally  sampled  channel  and  its  equalizer  can  be 
represented  by  the  cascade  of  a  single-input  multiple- 
output  (SIMO)  channel  and  a  multiple-input  single¬ 
output  (MISO)  equalizer.  The  system  equations  are 
given  by 

Zfc’  =  (1) 

3=0 

M  i'/— I 

(2) 

1=1  j=0 

where  h^'\  f^*^  are  the  ith  (sub)channel  and  its  equalizer 
with  length  Lh^Lj  respectively,  a,w,x,y  are  transmit¬ 
ted  symbol,  additive  noise,  received  data  and  equalizer 


0-8186-7576-4/96  $5.00  ©  1996  IEEE 


160 


output  respectively.  In  matrix  form,  we  have 


Xk 

=  Hsfc  -f-  Wfc 

(3) 

Vk 

=  f^Xk  =  q^Sk-hf^y^k 

(4) 

q 

=  H^f, 

(5) 

where  (•)^  denotes  Hermitian,  H  is  channel  matrix  and 
q  is  the  combined  channel.  We  shall  make  the  following 
assumptions: 

Al:  The  input  sequence  {sfc}  is  zero-mean  and 
E{3^s:}  =  6{k-l). 


There  are  several  reasons  to  choose  this  type  of  re¬ 
gions.  Since  the  MMSE  equalizer  is  the  optimal  linear 
equalizer,  any  equalizer  which  is  far  away  from  it  has 
a  large  MSE.  Therefore,  if  there  exist  CM  A  local  min¬ 
ima  in  these  regions,  one  of  the  minima  must  be  the 
optimum  CMA  equalizer  which  has  the  minimum  MSE. 
The  other  reason  is  the  strong  relationship  between  the 
MMSE  equEdizer  and  the  CMA  equalizer.  This  can  be 
seen  when  the  noise  approaches  to  zero. 

Without  loss  of  generality,  let’s  consider  q  in  the 
neighborhood  of  MMSE  equalizer  q^  at  delay  1/  =  1. 
q  and  $  can  be  partioned  into 


Sk  has  the  constant  modulus  property  (CM)  |5jt|  = 

1. 

A2:  Noise  is  zero-mean,  white  Gaussian  with  vari¬ 
ance  (T^  . 

3.  PROPERTIES  OF  CMA  EQUALIZERS 

The  analysis  in  this  section  is  restricted  for  real  param¬ 
eters  and  equiprobable  binary  source.  Generalizations 
of  the  complex  case  are  readily  available.  The  CMA 
minimizes, 

Jc{f)  =  E{{yl-lf}  (6) 

=  3||fl|^  -  2||f||p^  -  2l|q||t  +  1,  (7) 

where  ||f|lR  is  2-norm  defined  by  \/f*Rf ,  ||q||4  is  4- 
norm  defined  by 

R  ^  £{xfcxl}  =  HH'  +  cr"l.  (8) 

In  [14],  it  has  been  shown  that  CMA  equalizers  must 
be  in  the  signal  subspace  spanned  by  the  columns  of  H. 
Therefore  the  analysis  of  CMA  can  be  carried  out  in 
the  combined  channel  q  defined  in  (5).  The  equivalent 
CMA  cost  function  is  then  given  by 

J(q)  =  J.((H‘)‘q)  =  3||q||^-2||q||'^-2||q||t  +  l,  (9) 

where  ^  =  H^R(H‘)^.  In  the  absence  of  channel  noise, 
it  has  been  shown  that  the  CMA  using  fractionally- 
spaced  equalizers  converges  globally  [6]  to  one  of  zero 
forcing  equalizers,  i.e.,  qc  =  e^,Vl  <  where 

01/  a  unit  column  vector  with  1  at  the  i/th  entry  and 
zero  elsewhere.  In  the  presence  of  noise,  some  minima 
may  become  local  minima.  In  this  section,  we  study 
the  locations  of  these  CMA  equalizers.  Specifically,  we 
will  study  the  neighborhoods  of  MMSE  equalizers  which 
minimize 

j;;,(f)  ^  E{(y  -  sfc_.+o'}.  (10) 

where  v  is  the  delay  of  the  equalizer.  Note  that  the 
CMA  does  not  have  control  of  the  delay  v  due  to  the 
nature  of  the  blind  equalization. 


where  qi  is  the  intersymbol  interference  part,  0  repre¬ 
sents  the  signal  energy,  and  1  —  0  is  the  bias  between 
the  ZF  equalizer  and  q. 

In  order  to  locate  the  CMA  equalizer  (the  minimum 
point),  we  need  following  definitions,  given  the  MMSE 


equalizer  q^ 

= 

^m[l,q:„i], 

S 

A 

||qi  -  qmillc 

(12) 

el 

A 

err. 

3-20S^-20S.||qmI||t 

(13) 

Co 

A 

1 

3  -  2e?r  -  25^|lqmi|i| 

(14) 

c,(S) 

A 

-2(5"  +  ^) 

vm 

(15) 

C2(S) 

A 

3(5"  +  ^)"-2(l  +  (5  +  r.„)* 

)  (16) 

D(S) 

A 

ci(^)^  —  4c2(^)co. 

(17) 

The  following  theorem  gives  a  sufficient  condition  of  the 
existence  local  minimum,  its  location  and  also  gives  the 
size  of  the  region. 

Theorem  1  Under  the  condition  that  J^(fTn)  <!■»*/ 
L^dlqmilh)  <  0;  then  there  exists  a  local  minimum  in 

B  =  {O<6<Su,Ol<0<  du},  (18) 


where 

Sh 


inf  {6} 

S>0,D{S)<0 


(19) 


Or.  = 


eh  = 


'  -Ci(S)  -  ^ci(sy  -  4C2(«)co, 
2C2(fi) 


(20) 


max  ^ 
o<s<s^ 


'  —ci(S')  +  \/ci(Sy  —  4c2(^)^^ 
2C2(^) 


(21) 


This  theorem  provides  an  expression  L^dlqmilh)  to  de¬ 
termine  the  region  of  cylinder  B  which  includes  the 
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CMA  equalizer.  The  procedure  only  needs  the  parame¬ 
ters  of  the  MMSE  equalizer. 

Perhaps  the  most  interesting  concern  is  the  MSE  of 
the  CMA  equalizer.  With  the  result  of  Theorem  1,  we 
are  ready  to  give  the  answer. 

Theorem  2  (a)  The  MSE  of  the  CMA  equalizer  in  B 
is  hounded  by 


+  elsh,  (22) 


Aeu 


where  AS  is  the  extra  MSE,  i.e.,  AS  =  — 

(h)  Let  ASu  he  the  upper  bound  of  CMA  equal¬ 
izer  associated  with  delay  v,  then  the  MSE  of  the  opti¬ 
mum  CMA  equalizer  is  upper  hounded  by 

Sc*  —  min{^^  -f  ASu}.  (23) 

1/ 

(c)  The  MSE  of  CMA  is  approximated  by 

=^£l  +  0{£l).  (24) 

t'Tn. 

The  consequence  of  these  theorems  is  twofolds,  (i)  The 
CMA  equalizers  are  very  close  to  the  MMSE  equalizers; 
(ii)  There  may  exist  a  CMA  local  minimum  in  the  neigh¬ 
bourhood  of  a  MMSE  equalizer  which  has  significantly 
large  MSE. 


Both  JcF(h)  and  JQ(h)  involve  the  second-order  statis¬ 
tics  (in  different  ways)  whereas  Jc7M(h)  involves  the 
higher-order  statistics.  We  present  next  the  weighting 
and  the  constrained  approaches,  the  two  frequently  used 
techniques  in  multiobjective  optimization,  to  the  opti¬ 
mization  of  the  above  cost  functions. 

4.1.  The  CM-CF  Algorithm 

The  CM-CF  algorithm  is  derived  from  the  weighted  op¬ 
timization  of  the  constant  modulus  cost  JcAf(h)  and 
the  correlation  fitting  cost  JcF{h): 

h  ==  arg  min  a  Jcm (h)  pJcrCh),  (28) 

- V - ' 

j(h) 

where  a,p  are  weights  of  the  two  cost  functions  respec¬ 
tively.  7i  is  the  subspace  contains  the  channel  vector. 
In  practice,  H  may  be  constructed  from  the  principal 
component  structure  of  the  fading  channel  [11]. 

The  difficulty  of  this  optimization  is  that  the  expKcit 
form  of  the  constant  modulus  cost  Jc7M (h)  as  a  function 
of  the  channel  is  unknown.  Fortunately,  from  the  anal¬ 
ysis  in  Section  3,  the  constant  modulus  equalizer  can  be 
approximated  by  the  MMSE  equalizer  which  can  be  ob¬ 
tained  once  the  channel  is  estimated.  A  gradient  search 
is  used  to  minimize  J(li), 

hn+i  =  hn  -  AiVh  J(hn).  (29) 

where  ^  is  a  step  size. 


4.  THE  MULTIPLE  OBJECTIVE 
OPTIMIZATION  APPROACH 

To  avoid  the  undesirable  local  minimum  of  CMA,  one 
can  use  the  channel  estimation  based  equalization  ap¬ 
proach.  Once  the  channel  is  estimated,  an  MMSE  equal¬ 
izer  can  be  constructed  by  selecting  the  optimal  i/  in 
(10).  Furthermore,  this  approach  provides  the  flexibility 
to  design  other  types  of  receivers,  such  as  decision-feed 
back  equalizer,  or  maximum  likelihood  sequence  estima¬ 
tor. 

Considered  in  this  paper  are  the  costs  associated  with 
the  constant  modulus  property  JcAf(h),  the  second- 
order  statistics  JcF(h),  and  the  observed  data  jQ(h): 

JcM(h)  =  ^(Is/fcl=-l)=  (25) 

k 

JoF(h)  =  ^  |5'ii(m)  -  ^.^,•(m)|^  (26) 

i,j,m 

jQ(h)  =  h"Qh.  (27) 

Note  that  the  optimization  of  ^^(h)  leads  to,  among 
a  number  of  eigenstructure-based  algorithms,  the  least- 
squares  [7]  or  the  subspace  channel  estimators  [8].  Ma¬ 
trix  Q  in  JQ(h)  can  be  obtained  from  the  data  directly. 


4.2.  CMA  with  Subspace  Constraints 

In  the  constrained  approach,  we  consider  the  following 

optimization 

h  =  arg  min  JcAf(h)  subject  to  JQ{h)  <  Q!||h||^.  (30) 
h 

When  Q  is  constructed  from  the  true  covariance  ma¬ 
trix  Rxx]  the  “true”  channel  is  in  the  null  space  of  Q 
and  the  channel  identification  becomes  one  of  finding 
the  eigenvector  associated  the  zero  eigenvalue.  When 
the  estimated  covariance  matrix  is  used  and  the  chan¬ 
nel  is  close  to  be  unidentifiable  [9],  the  null  space  is  no 
longer  easy  to  determine.  It  is  therefore  reasonable  to 
extend  the  subspace  to  include  additional  dimensions. 
Mathematically,  we  may  view  this  approach  as  restrict¬ 
ing  the  channel  vector  in  a  subspace  that  the  quadratic 
cost  JQ(h)  is  constrained  by  an  upper  bound.  Let  V  be 
the  linear  subspace  in  which 

J<5(h)<a||hi|^  (31) 

for  some  pre-specified  a.  As  a  suboptimal  approach  to 
(30),  the  channel  estimator  is  then  obtained  from  the 
following  constrained  optimization 

min  JcM(h).  (32) 
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The  above  optimization  can  then  be  transformed  into 
an  unconstrained  optimization.  It  can  be  shown  that 
C  can  be  obtained  from  the  span  of  the  eigen¬ 

vectors  associated  with  the  smallest  several  eigenvalues 
of  matrix  4  =  B^QB.  A  gradient-type  optimization  is 
used  similar  to  (29). 

5.  SIMULATIONS 

The  class  of  two-ray  multipath  fading  channels  with  in¬ 
dependently  faded  components  is  used  in  the  simulation. 
The  channel  impulse  response  is  given  by 

2 

h{t)  -  ^  ai’pit  -  Ti),  (33) 

i=\ 

where  {ai}  are  independent  zero-mean  complex  Gaus¬ 
sian  variables;  p(t)  is  the  raised-cosine  waveform  with 
roU-oif  factor  0.25  and  the  length  of  6  symbol  intervals. 
Uniformly  distributed  in  [0,  2T]  (T  is  a  symbol  inter¬ 
val),  the  delays  {ri}  are  statistically  independent.  The 
signal  is  sampled  at  twice  of  the  symbol  rate. 

We  compared  the  mean-square  error  of  the  equalized 
channel  using  (i)  the  CM-CF  approach;  (ii)  the  CMA 
with  Subspace  Constraints;  (iii)  the  Least-Squares  CMA 
(LSCMA)  [1];  (iv)  the  MMSE  equalizer  constructed 
from  the  subspace  channel  estimator.  The  cumulative 
percentage  of  the  channel  estimates  for  a  fixed  MSE  is 
computed  and  shown  in  Fig.  1.  When  compared  with 
the  LSCMA  algorithm  (the  dashdot  line),  the  proposed 
algorithms  (the  solid  and  dashed  lines)  has  consider¬ 
able  improvement  for  the  small  MSE,  such  as  the  MSE 
less  than  0.02,  and  improvement  is  reduced  as  MSE  in¬ 
creases. 


2r2T  30  dala 


Figure  1:  Performance  comparison. 
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Abstract 

In  this  paper  we  introduce  several  modifications  to  the 
Baum&Welch  (BW)  formulas  used  to  reestimate  the 
parameters  of  a  Hidden  Markov  Model  (HMM).  The 
estimated  parameter  is  the  channel  impulse  response  (CIR) 
of  a  communication  system  which  is  known  to  be  time- 
varying.  With  these  modifications,  channel  tracking 
properties  of  a  BW-based  algorithm  are  improved.  The 
resulting  algorithm  is  tested  in  a  specific  mobile  radio 
environment  (the  GSM  system),  exhibiting  good 
performance  at  expenses  of  higher  computational 
complexity. 


1.  Introduction 

It’s  well  known  that  no  high-speed  band-limited  digital 
communication  can  be  carried  out  without  the  help  of  an 
equalizer.  Conventional  approaches  to  the  adjustment  of  this 
equalizer  require  the  transmission  of  a  training  sequence  (i.e. 
known  a  priori  by  the  receiver  and  the  transmitter),  which 
provides  an  accurate  initial  estimate  for  the  equalizer  taps; 
afterwards,  slighter  adjustments  can  be  made  on-line  to  adapt 
this  first  estimate  to  the,  almost  always,  changing 
environment.  Of  course,  the  transmission  of  these  training 
sequences,  when  possible,  brings  down  the  capacity  of  the 
system.  For  that  reason,  there  is  an  increasing  interest  around 
blind  equalizers  [1,2,3]  which  deal  with  the  problem  of  the 
adjustment  without  training  sequences  (i.e.  blindly). 

In  [3],  an  Estimation-Modification  (EM)  Viterbi-based 
algorithm  is  proposed  to  perform  jointly  a  Maximum 
Likelihood  (ML)  channel  estimation  and  sequence  detection. 
However,  modelling  the  received  signal  as  a  HMM  allows  us 
to  make  use  of  the  complete  theory  developed  for  these 
models.  For  example,  the  Baum&Welch  (BW)  algorithm 
was  proposed  in  [7]  to  estimate  the  parameters  of  the 
channel  and  the  characteristics  of  the  modulation.  This 
algorithm  is  known  to  lead,  at  least,  to  a  local  maximum  of 
the  likelihood  function  [4],  what  is  not  guaranteed  by  the 
Viterbi  algorithm  (VA).  In  this  paper,  several  modifications 


to  this  previously  proposed  algorithm  are  introduced  to  cope 
with  the  special  features  of  mobile  radio  channels. 

2.  Signal  model 

As  mentioned  before,  the  environment  in  which  the  new 
algorithm  is  tested  is  the  Paneuropean  Mobile  Radio  System, 
also  known  as  GSM.  In  this  system,  a  constant-  envelope 
Gaussian  Minimum  Shift  Keying  (GMSK)  modulation 
scheme  with  equivalent  bandwidth  (BT)  equal  to  0.3  is  used. 
The  access  strategy  is  TDMA  with  8  timeslots  per  carrier 
and  156.25  bit-intervals  per  timeslot  in  Normal  bursts.  At  the 
chosen  bit  rate  (270.8  kb/s),  multipath  propagation  leads  to 
deep  fades  and  to  uncontrolled  Intersymbol  Interference 
(ISI).  Besides,  and  due  to  the  mobile  nature  of  the  receiver, 
Doppler  effect  is  also  observed. 


r/T 


Fig.1  :  Transmission  subsystem. 

Taking  into  account  the  above  mentioned  features,  the  signal 
at  the  input  of  the  BW  detector  can  be  modelled  as: 

4n]  =  f(s[n])+H'[n]  (1) 

where  f(.)  is  a  non-linear  function  of  the  present  state  s[n], 
and  {w[n]}  denotes  a  sequence  of  zero-mean  Gaussian 
variables  with  variance  (AWGN).  If  we  go  on  developing 
an  expression  for  f(.)  we  get: 

1=0  1=0 

where  h  and  d  are  the  baseband  equivalences  for  ho  and  d©. 
For  a  modulation  index  of  0.5,  0[/z]  can  be  expressed  as: 
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Differentiating  each  component  in  the  sum  e=Se,  with 
respect  to  /n/*'  and^m/'  we  obtain: 

[^y,[«](4n]-mj°  -mf  -n)  • 


Ve?  = 


Lii=i 


(14) 


whose  Hessian  is  positive  definite  unless 

dt  iriH  =0  (15) 

n=\ 

(i.e.  that  state  was  not  observed  along  the  timeslot),  or 
L 

c2) 


(16) 


n=l 


(i.e.that  state  was  observed  only  once,  when  n=n/).  In  Aose 
cases,  of  course,  there  is  no  sense  in  looking  for  a  linear 
approximation.  From  equating  the  gradient  to  zero  and 
carrying  out  proper  transformations,  we  find  that 


„(0  -  v"=L 
m,  — - 


Xy,[n]4"]  " 

V  11=1  /  _ i 


c-  fir,  ^  ■  f  Xr .  H4"]  • " 
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(17) 


(18) 


f-n=l  1=1 

where  ^ 

>4=Xr.[4  B=Xr,W"5  c=ir, [«]•«" (20) 

n=i  «=i 

A  =  i4*C-B" 

provide  the  components  of  the  desired  vector  and  an  estimate 
for  the  variance  of  the  AWGN. 

Finally,  special  measures  should  be  taken  for  the  cases 
above  mentioned  in  which  m/^  and  m/'  remain  undefined.  In 
the  first  case,  cl ),  those  components  of  the  means  vector  are 
not  considered  in  order  to  obtain  h.  The  method  adopted  is 
blocking  them  >vith  a  (diagonal)  Weighting  Matrix  to  be 
included  in  the  LS  estimate  of  step  3.  The  elements  of  such 
matrix  are  a  measure  of  the  reliability  in  the  estimation  of 
every  component  of  vector  m,  as  a  function  of  the  times  this 
state  was  observed  along  the  sequence.  To  be  precise: 


W  = 


0 

0  Wo 


0  ^ 
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n=l 


^0  -  0  wj 

In  the  second  case,  c2  j,  the  static  estimate  for  m  replaces  the 
linear  approximation.  That  is: 
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(22) 


The  resulting  algorithm  will  be  referred  from  now  on  as  the 
Time-Dependent  BW  (TDBW)  algorithm. 


5.  Simulation  results 

We  tested  the  performance  of  the  algorithm  for  the 
channels  described  in  the  ETSI  recommendations.  The  speed 
for  the  mobiles  in  each  environment  was  chosen  according  to 
[8].  Among  all  the  cases,  the  most  interesting  ones  were 
RA250  and  RAIOO  (Rural  Area  Environment;  speed  equal  to 
100  and  250  km/h),  since  channel  coherence  time-intervals 
are  the  lowest  ones.  It  should  also  be  remarked  that  a 
sampling  rate  of  2  samples/symbol  was  considered  to 
compensate  for  possible  timing  errors. 


Fig.  2:  Tracking  for  the  first  tap  of  a  RA250  channel  vs.  time 
in  amplitude  and  phase  for  the  proposed  algorithm.  Dashed 
lines  stand  for  the  true  channel;  solid  lines  for  the  TDBW 
estimate. 

Channel  tracking  properties  of  the  proposed  algorithm 
are  shown  in  Fig.  2  and  Fig.  3.  It  can  be  observed  that  such 
properties  are  good  as  long  as  the  linear  approximation  for 
the  channel  evolution  is  feasible.  Comparing  those  figures 
with  those  obtained  with  the  ABW  algorithm  (Fig.  4),  we 
conclude  that  CIR  tracking  is  now  much  less  noisy. 

In  addition  to  this,  now  there  is  no  need  for  waiting  the 
algorithm  to  converge  within  the  first  samples  of  each 
timeslot.  Moreover,  the  TDBW  version  is  far  more  robust 
against  deep  fades  which  usually  make  the  ABW  algorithm 
to  lose  tracking.  The  reason  for  this  robustness  is  that 
TDBW  is  a  batch-type  algorithm,  where  every  sample  in  the 
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timeslot  is  used  to  estimate  the  CIR  in  every  instant  (even 
though  in  deep  local  fades),  whereas  in  the  ABW  version  the 
estimate  relies  mainly  on  the  previous  and,  maybe,  already- 
faded  samples.  Of  course,  those  improvements  are 
conditioned  to  an  approximately  linear  variation  of  the 
channel,  what  is  not  required  in  the  ABW  algorithm. 
However,  if  this  requirement  was  not  met,  it  would  always  be 
possible  to  increase  polynomial  order  to  obtain  a  better 
approximation  for  the  channel  evolution. 

Estimated  channel  vs.  True  channel 


Fig.  3:  Tracking  for  the  first  tap  of  a  RA250  channel  In 
rectangular  coordinates  (Re{h^(t)}  and  lm{h,(t)}).  Dashed 
lines  stand  for  the  true  channel;  solid  lines  for  the  TDBW 
estimate. 
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Fig.  4:  Tracking  for  the  first  tap  of  a  RA250  channel  vs. 
time.  Dashed  lines  stand  for  the  true  channel;  solid  lines  for 
the  ABW  estimate. 
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Fig.  5:  Tracking  for  the  first  tap  of  a  RA250  channel  vs.  time 
in  amplitude  and  phase.  Dashed  lines  stand  for  the  true 
channel;  solid  lines  for  the  BBW  estimate. 

The  main  advantage  with  respect  to  the  BBW  algorithm 
described  in  section  3,  is  the  ability  to  track  the  evolution  of 
the  CIR  along  the  timeslot,  instead  of  approximating  each 


tap  by  a  constant  value  (Fig.  5).  For  mobile  stations 
exhibiting  rather  high  speed,  it  reverts  in  a  lower  BER, 

On  the  other  hand,  the  main  drawback  of  the  proposed 
TDBW  algorithm  is  the  increase  in  the  computational  burden 
when  compared  with  both  the  BBW  and  ABW  versions. 
And,  what  is  more,  now  the  number  of  parameters  to  be 
estimated  is  double  the  quantity  required  before  (m^°  and 
vs.  m),  whereas  the  amount  of  data  available  to  perform  that 
estimation  is  just  the  same  (one  timeslot).  Consequently,  for 
CIRs  exhibiting  large  delay-spreads  such  as  HT  (Hilly 
Terrain  environment),  the  variance  increase  in  the  estimation 
of  some  components  in  m[n]  is  very  severe  and  the 
Weighting  Matrix  cannot  prevent  the  system  from 
unstability.  In  those  cases,  the  only  way  to  make  the 
algorithm  to  converge  is  considering  larger  timeslots  which 
contain  more  symbols. 

6.  Conclusions 

A  new  technique  to  include  the  time- varying  nature  of  the 
parameters  of  a  HMM  in  the  BW  reestimation  formulas  has 
been  presented.  The  resulting  algorithm  for  blind  channel 
estimation,  TDBW,  has  been  compared  with  those  proposed 
in  previous  references  (BBW  and  ABW),  and  its 
performance  qualitatively  evaluated  in  a  very  concrete 
environment  (the  GSM  system).  The  most  important 
drawback  of  the  algorithm  is  its  high  computational  cost. 

Future  work  is  concerned  about  applying  the  theory  of 
HMMs  and  the  developed  BW-based  algorithms  to  other 
communication  environments  such  as  Underwater  Acoustics 
(UWA),  or  in  other  communication  problems. 
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<l)[n]  =  n^q[r\a[n-r]+6[n\ 


=-R 


(3) 


where  9[r]e  [0,0.5]  are  the  weights  corresponding  to  the 
(sampled)  gaussian-shaping  pulse  and  0[n]e  {0,7t/2,JC,37t/2} 
accounts  for  the  accumulated  phase  at  instant  n  [5,6].  Now 
we  conclude  that  the  number  of  transmitter  symbols  (bits) 
involved  in  a  single  observation  at  the  receiver  is  given  by; 

/,  =  /^  +  4-l  =  (2/?+l)+/,-l  (4) 

However,  the  amount  of  ISI  produced  by  the  OMSK 
modulator  for  BT=0.3  can  be  neglected  without  significant 
performance  loss.  Under  this  simplifying  assumption,  which 
reverts  in  a  lower  number  of  states  and  a  reduced 
computational  complexity,  we  get  that: 

i?  =  0  =>/,=/,,  (5) 

At  this  point,  we  can  already  model  each  observation  in  the 
received  sequence,  Xi.=(x[7],x[2],..,x[L])\  as  a  probabilistic 
function  of  the  present  state  sln]=(,a[n],..,a[n4,+l],e[n])  , 
obtaining  a  description  of  xl  us  a  first  order  HMM  with 
N=4-2‘‘  states. 

3.  Overview  of  the  BW  algorithm 


On  the  basis  of  this  first  order  HMM  and  by  means  of  the 
BW  algorithm,  it  is  possible  to  obtain  a  solution  to  the 
problem  of  the  identification  of  the  unknown  parameters  of 
the  model,  that  is,  and  h={ho...hic-if  W  To  be  precise, 
the  parameters  really  estimated  are  and  the  means  vector 
m=(m/..m/v)^  corresponding  to  the  noise-free  ISI-corrupted 
received  signal  associated  with  the  N  states  of  the  system. 
Assuming  a  FIR  model  for  the  channel,  m  is  related  to  h 
through  the  linear  constraints; 

m  =  D^  (6) 

h  =D''m 


corresponding  to  the  modulator  consecutive  ^  outputs 
associated  to  the  N  different  states  of  the  system.  D  denotes 
pseudoinverse.  The  batch  Baum&Welch  (BBW)  algorithm, 
thoroughly  explained  in  [7],  can  be  outlined  as. 

1.  Projection  of  h  on  m  by  means  of  the  additional 
linear  constraints: 

m  =  Dh  (7) 

2.  Reestimation  of  and  m  using  the  BW 
reestimation  formulas. 


3.  Least  Squares  (LS)  estimation  of  h  using  again 
linear  constraints. 

h=D^iii  (8) 

4.  Repeat  steps  1..4  until  convergence. 


The  BW  reestimation  formulas  used  in  step  2,  state  as 
follows; 

]^y,[n]4w] 

\<i<N 


m.  =  — 


(9) 


n=l 


^  n=l  i=I 

where  Yi[«]  is  the  probability  of  being  in  state  i  at  instant  n 
given  the  model  and  the  observed  sequence.  However,  these 
formulas  implicitly  assume  the  CIR  to  be  stationary  within  a 
timeslot  duration,  what  is  not  realistic  when  the  timeslot  is 
long  enough  or  the  channel  varies  rapidly.  Hence,  we  will 
obtain  other  reestimation  formulas  to  solve  this  problem. 


4.  Modified  algorithm 

Several  strategies  can  be  considered  to  cope  with  the 
time- varying  nature  of  the  channel;  estimating  the  CIR  with  a 
recursive  adaptation  scheme  such  as  LMS  what  is  referred  in 
[7]  as  the  Adaptive  BW  (ABW)  algorithm;  or  fragmenting 
each  timeslot  in  subblocks.  As  opposed  to  those  methods,  we 
will  try  to  include  the  time-varying  nature  of  the  channel 
directly  in  the  reestimation  formulas. 

We  can  approximate  the  evolution  of  every  tap  in  the 
CIR,  hj,  by  means  of  a  polynomial  in  n: 

hj[n]  =  h/°+h/'n  +  hl-n-+...  (10) 

For  the  channels  specified  in  the  ETSI  recommendations  and 
assuming  the  speed  of  the  mobile  to  be  less  than  250  km/h, 
the  linear  approximation  was  observed  to  be  good  enough. 
Assembling  all  the  taps  in  a  single  column  vector: 

h[„]  =  h'Vh"n  (11) 


and  applying  the  linear  transformation  above  described 

m[n]  =  Dh[n]  =  D(h^  Vh^‘  •  n)  =  m^®  +m^'  n  (12) 

we  observe  that  the  evolution  for  the  means  is  also  linear. 
Vectors  and  will  be  obtained  in  order  to  minimize 
the  MSE  given  by  the  following  expression: 
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Abstract 

The  problem  of  identifying/equalizing  a  digital  com- 
municaiion  channel  based  on  Us  temporally  or  spa¬ 
tially  oversampled  output  has  recently  gained  much 
attention  (single-input /multiple- ouptut  -  SIMO  -  de- 
convolution).  In  this  context,  we  propose  a  new  joint 
data/channel  estimation  method.  Our  technique  re¬ 
lies  on  the  minimization  of  a  bilinear  MSE  cost  func¬ 
tion,  where  the  variables  to  be  adjusted  are  the  chan¬ 
nel  coefficient  matrix  and  a  linear  equalizer.  We 
show  that  this  a  priori  choice  of  a  linear  equalization 
structure  allows  the  derivation  of  a  second-order  uni- 
modal  criterion,  leading  to  globally  convergent  identifi¬ 
cation/ equalization  schemes.  The  proposed  method  is 
completely  blind  in  that  1)  no  assumption  is  required 
upon  the  transmitted  sequence  statistics  or  alphabet, 
and  2)  it  shows  some  robustness  with  respect  to  the 
channel  order  estimation  problem  (thus  improving  on 
most  previous  related  works).  It  also  allows  the  free 
choice  of  a  delay  in  the  equalizer  so  that  output  noise 
amplification  can  be  optimized. 

1  Introduction 

In  the  context  of  digital  radiocommunications,  the 
signals  are  transmitted  through  propagation  channels 
which  introduce  intersymbol  interference  (ISI).  The 
channels  can  be  represented  as  FIR  filters  which  have 
to  be  identified  and/or  equalized  for  the  transmit¬ 
ted  symbols  to  be  recovered.  Since  the  pioneering 
work  by  Sato  [1],  all  blind  equalization  techniques 
(which  do  not  rely  on  training  sequences)  have  been 
based  on  the  use  of  higher-order  statistics  (HOS)  of 
the  received  signals,  though  HOS  methods  largely  suf¬ 
fer  from  slow  and  ill  convergence  problems  [2].  Re¬ 
cently,  it  was  shown  by  Gardner  [3]  and  Tong  et  al 
[7]  that  blind  deconvolution  based  on  the  sole  second- 
order  statistics  was  a  feasible  task,  provided  the  ob¬ 
served  signals  could  be  seen  as  the  outputs  of  a  SIMO 
system  with  sufficient  channel  disparity  (the  differ¬ 
ent  channels-polynomials  should  not  have  any  com¬ 
mon  zero).  In  this  context,  a  number  of  contributions 
have  been  made  in  which  the  transmitted  sequence 
or  the  channel  coefficients  are  recovered  through  sub¬ 
space  decompositions  of  either  the  received  data  ma- 
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trix  (see  the  so-called  deterministic  methods  [4,5,6])  or 
the  received  data  correlation  matrix  (see  the  stochastic 
methods  [7,8]).  Other  interesting  approaches  were  also 
studied  in  [9,10,11,12].  Here,  we  introduce  a  blind  and 
mainly  adaptive  estimation  method  in  which  a  multi¬ 
channel  estimate  and  a  linear  equalizer  are  adjusted 
so  as  to  minimize  an  observation  fitting  cost  function. 
The  possible  local  minima  of  the  proposed  criterion 
are  investigated  and  global  convergence  is  established. 
The  presented  algorithm  shows  several  attractive  fea¬ 
tures  which  make  it  an  interesting  alternative  to  most 
existing  methods: 

•  First,  it  shows  some  robustness  with  respect  to 
the  additive  noise  (though  it  is  optimal  only  in 
the  noise  free  case)  and  to  possible  errors  in  the 
channel  order  estimation. 

•  It  also  allows  the  use  of  any  reconstruction  de¬ 
lays  in  the  equalizer  so  that  noise  variance  may 
be  optimized  at  the  equalizer’s  output. 

Notations:  3?  real  part  of  a  complex.  Ef)  statistical 
expectation;  ()*  complex  conjugation;  ()^  transposi¬ 
tion;  0+  trans-conjugation;  |  .  |  L2-noTm  of  a  vector 
or  matrix;  I  identity  matrix. 

2  Multichannel  representation 

The  SIMO  equivalent  model  of  a  digital  communi¬ 
cation  system  relies  on  the  existence  of  a  number  L  of 
different  linear  time-limited  digital  filters  (channels) 
h),..,  driven  by  the  same  PAM/QAM  sequence 
Sk,  the  noisy  outputs  of  which  are  observed: 

M 

=  +  bn  for  i  =  I..L  (1) 

jb=0 

Sk  and  =  1..L,  are  mutually  uncorrelated  pro¬ 
cesses,  not  necessarily  white.  We  assume  (w.l.o.g.) 
E  I  Sk  1,  E  \bl^  <t|.  In  the  context  of  antenna- 
array  based  reception,  the  channel  h*  represents  the 
baud-rate  impulse  response  of  the  propagation  chan¬ 
nel  linking  the  transmitter  and  the  antenna  (spatial 


0-8186-7576-4/96  $5.00  ©  1996  IEEE 


168 


diversity).  In  a  mono-antenna  scenario,  channel  diver¬ 
sity  can  still  be  obtained  by  means  of  temporal  over- 
sampling  with  a  factor  L  (compared  to  the  baud  rate) 
at  the  antenna  output,  leading  to  fractionally-spaced 
(FS)  reception.  In  the  FS  context,  the  channels  M') 
correspond  to  sampled  versions  (at  rate  T)  of  a  sin¬ 
gle  propagation  channel,  at  various  sampling  phases 
(i  —  l)T/L,  i  =  1..L  (see  [7]  for  more  details).  Here, 
M  denotes  the  ISI  length.  We  adopt  the  following 
vectorized  notations; 

Xn  —  [^n  >  ■  ■  >  ]  ’ 

b„  = 

h  =  [ho,..,hM] 

Then,  we  have 

x„  =  h[s„,  ...,Sn-M]* +b„  (2) 

Consider  the  space-time 

samples  vectors  ,  5„  = 

and  5„  =  [s„,  •  •  • , 

where  N  is  the  window  size  per  channel  and  P  = 
M+N  is  the  number  of  symbols  involved  in  the  expres¬ 
sion  of  vector  X„.  The  following  linear  model  holds 

Xn=T(h)S„  +  B„,  (3) 

where  T(h)  is  the  so-called  LN  x  P  Sylvester  matrix: 

/  h(0)  •  •  •  h(M)  0  •  •  •  0  \ 

T(h)  = 

\  0  •••  0  h(0)  h(M)  / 

To  enable  blind  deconvolution  of  our  SIMO  system, 
we  assume  throughout  the  paper  that  (HI)  T(h)  has 
full  column  rank  P,  with  LN  >  P.  In  the  follow¬ 
ing,  we  concentrate  on  the  joint  blind  estimation  of 
the  pair  (h,a;)  where  w  is  a  LN  x  1  linear  equalizer 
satisfying  the  following  condition  in  the  noiseless  case: 
a;*T(h)  =  [0, 0, 1, 0, 01.  Note  that,  although  writ¬ 
ten  as  a  zero-forcing  (ZF)  equalization  problem,  the 
actual  (noisy)  problem  does  not  fully  reduces  to  some 
ZF  equalizer. 

3  The  proposed  method 

Assume  in  a  first  step  that  M  is  known.  Consider 
an  L  X  (M  -h  1)  matrix  h  (channels  estimate)  and  an 
LN  X  1  vector  to  (equalizer  estimate).  Our  algorithm 
aims  at  tuning  these  channel  and  equalizer  estimates 
so  that  the  convolution  product  between  each  chan¬ 
nel  estimate  and  the  equalizer  output  matches  the  ob¬ 
served  signals  as  sketched  in  fig.l.  Mathematically, 
this  writes: 

minimize  J(h,  (D)  =  |  x„  —  hX^u  p,  (4) 

where  Xn  =  [An,  ..,  Au-m]  is  a  LiV  x  (M  +  p 

sample  matrix.  This  observation  fitting  criterion  is 


bilinear  in  the  coefficients  of  ^  ~  (hj^)-  If  is  simi¬ 
lar  in  spirit  to  the  previously  proposed  deterministic 
maximum  likelihood  (DML)  criterion  [9].  The  DML 
method,  in  which  the  linear  equalizer  is  typically  not 
a  free  variable  (being  replaced  by  a  pseudo-inverse  of 
T(h)),  is  however  subject  to  ill- convergence  and  is 
computationally  demanding  as  well.  This  is  not  the 
case  here,  as  will  be  shown  in  the  following. 

3 •  1  Criterion  minima 

Assume  a  noise  free  (cr^  =  0)  situation.  Consider 
any  solution  of  the  form  6  =  (ah,a;/a),  where  uj  is 
some  ideal  zero-delay  equalizer  and  a  is  an  arbitrary 
complex  scalar.  Clearly,  9  achieves  global  minimiza¬ 
tion  of  our  criterion,  thus  provides  a  stationary  point 
of  J(),  since  J()  is  positive.  Conversely,  it  may  be 
shown  that  J  =:  0  leads  to  channel  equalization  and 
identification  in  the  absence  of  noise: 

Lemma  3.1  Lei  Zn  be  the  Lx  I  residual  error  'process 

for  some  0  =  (h,a)),  defined  as  Zn  =  Xn  — 

Assume  is  persistently  exciting  of  order  at  least 

2M.  Suppose  J{0)  =  0,  i.e.  Zn  =  0  almost  surely. 
Then  6  =  {ah.,u;/a),  where  uj  is  an  ideal  zero-delay 
equalizer  and  a  is  a  complex  scalar. 

Proof  If  Xn  =  hA^c^,  we  also  have  from  (3) 

=  T(h)S„  -  T(h)[X„,..,X„_p+i]'w 

under  the  persistent  excitation  condition,  the  subspace 
spanned  by  the  observed  vectors  An  is  found  to  be 
simultaneously  T(h)  and  T(h).  By  theorem  2  in  [8], 
we  have  h  =  ah.  It  follows  that  Sn  =  o^A^u;,  showing 
that  w/a  is  a  zero-delay  equalizer. □ 

3.2  Stability  of  minima 

Here,  we  check  the  absence  of  undesired  stable  lo¬ 
cal  minima.  Let  6  =  (h,  w)  be  any  stationary  point 
(cancelling  the  first  partial  derivatives)  of  J().  The 
stability  of  6  is  investigated  through  the  criterion  sec¬ 
ond  order  expansion:  Let  SO  {Sh,6uj)  be  a  small 
move  around  0.  Let  AJ  =  J (6  66)  —  J{9)j  we  find, 
up  to  the  second  order: 

AJ  «  X  I  z„  -  (5hX*  w  -  hX*Ju>  p  -E  I  z„  p, 

where  z„  is  the  residual  of  9,  i.e.  (x^  —  hX^w).  It  can 
be  inferred  that 

Lemma  3.2  0  is  a  stable  minimum  if  and  only  if  all 
components  in  Zn  are  decorrelaied  from  A^.  Due  to 
the  particular  form  of  z^f  this  implies  Zn  =  0. 

Due  to  lack  of  space,  the  proof  of  this  lemma  will  be 
detailed  in  a  forthcoming  paper.  Now  we  have  estab¬ 
lished  that  a  gradient-based  algorithm  based  on  the 
noise  free  criterion  in  (4)  always  converges  to  the  true 
(channels,  equalizer)  pair,  up  to  an  arbitrary  scalar 
constant. 
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3«3  Robustness 

With  respect  to  noise  The  presence  of  additive 
noise  causes  bias  in  the  results,  however  the  simula¬ 
tions  indicate  that  acceptable  channel  and  equalizer 
estimates  can  be  obtained  from  this  algorithm  under 
realistic  SNR  conditions.  Bias  removal  techniques, 
based  on  some  norm-constrained  minimization,  can 
also  be  adapted  to  our  problem  [14]. 

With  respect  to  model  order  In  practical  situ¬ 
ations,  the  multichannel  order  (M)  is  probably  not 
well  defined,  especially  in  the  case  where  the  channels 
coefficients  taper  off  at  the  ends.  Then,  overestima- 
tion  of  M  is  likely  to  occur.  We  stress  the  robust¬ 
ness  of  the  proposed  method  with  respect  to  such  er¬ 
rors.  The  proof  goes  as  follows:  Let  h  =  [ho,..,hx] 
be  the  channel  candidate,  with  K  >  M.  As  in 
the  case  of  correct  order  estimation,  the  minimiza¬ 
tion  of  J(h,  (j)  leads  to  =  hA*a?,  this  again  gives 
span(T(h))  =  span{T{h)).  As  a  result  the  channels 
in  h  admit  a,  K  —  M  order  common  polynomial  factor, 
denoted  g(z),  and  T(h)  factors  into  T{h)Q  where  Q 
is  the  P  X  [N  -\-K)  Sylvester  matrix  associated  to  q(z) 
[13].  We  have 

T{h)Sn  =  T{h)Q[Sn,..,Sn-K^N^iYT{hyQ 
Sn  =  Q[5'n,  ..,5n-K~iV+l]^T(h)*a; 

and  it  is  clearly  seen  that  such  a  condition  cannot  hold 
with  q{z)  =  1,  unless  there  exists  some  recurrence  re¬ 
lationship  between  the  successive  emitted  symbols,  a 
fact  which  is  not  compatible  with  the  persistent  exci¬ 
tation  assumption.  As  a  result,  the  only  possible  solu¬ 
tion  of  the  above  equation  is  q{z)  —  1  and  T(h)*u;  =  I. 
This  result  is  checked  below  in  trie  simulations  section. 

4  Equalization  with  non-zero  delay 

The  performances  of  a  linear  multichannel  equalizer 
generally  depend  on  its  delay  [15].  Hence,  it  is  useful 
to  control  the  delay  introduced  by  the  equalizer.  This 
is  easily  obtained  by  rewriting  the  proposed  criterion 

as  :  J(h,a;)  =  I  x^-d  —  P,  where  d  is  a 

chosen  delay  parameter.  Note  however  that  in  case 
of  model  order  overestimation,  the  actual  equalizer’s 
delay  may  not  be  determined  in  advance,  since  the  non 
zero  channel  coefficents  estimates  in  h  are  subject  to 
a  possible  shift.  However  this  problem  should  not  be 
very  severe  in  presence  of  noise. 

5  Adaptive  algorithm 

A  possible  implementation  of  the  proposed  method 
which  allows  full  adaptivity  in  the  context  of  time 
varying  statistics/channels  is  as  follows,  based  on  a 
stochastic  gradient  approximation.  Note  that  other 
recursive  least-square  based  approaches  can  also  be 
used. 


hn  +  l  =  hn  -  A(hnYn  “  X„)y+, 

^n+1 

where  A  is  a  small  stepsize. 

6  simulations 

We  consider  the  context  of  L  =  2  randomly  chosen 
channels  of  length  M-fl  =  5,  given  by  h}  =  [-0.089- 
0.489i; -0.340  -  O.OlOj;  0.022  -  0.069i; -0.192  - 
0.031i;  0.464  -  0.613;],  h?  =  [0.422  -h  0.467;;  -0.075  + 
0.320; ;  0.185  -  0.049;;  0.223  +  0.122;;  0.145  -  0.609;], 
driven  by  a  white  QPSK  sequence.  Output  SNR  is 
set  to  on  each  of  the  (normalized)  channels.  We 
choose  N  —  h  and  consider  only  the  zero  delay  case 
(d  =  0).  Fig.2  shows  the  equalization  results  in  terms 
of  output  mean  square  error  between  the  transmit¬ 
ted  and  the  recovered  symbols,  using  the  linear  equal¬ 
izer  provided  by  the  algorithm  cin  j  versus  the  iteration 
number  n.  Both  cases  of  a  correct  model  order  esti¬ 
mation  (^K  =  M  =  4)  and  of  a  severe  overestimation 
{K  —  8)  are  illustrated. 

Fig. 3  shows  the  identification  results  in  terms  of 
the  distance  between  h„  and  the  true  channels,  up 
to  a,  where  a  is  defined  in  lemma  3.1,  defined  by 

\  hn/a  —  h  p  /  I  h  p.  Note  that  channel  identifica¬ 
tion  is  well  achieved  despite  the  additive  noise.  This 
is  due  to  the  MSE-like  structure  of  the  criterion.  Ro¬ 
bustness  with  respect  to  the  model  order  error  is  con¬ 
firmed  by  both  equalization  and  identification  results, 
though  the  obtained  performances  seem  to  degrade  in 
the  case  AT  =  8.  The  rise  in  the  steady-state  error  is 
due  to  adaptation  noise,  which  can  be  compensated 
for  by  decreasing  the  stepsize.  This  robustness  gives 
advantage  over  the  methods  found  in  [7, 8,4,5]. 

7  Discussion 

We  have  addressed  the  problem  of  blind  (adaptive) 
estimation  of  both  the  channel  coefficients  and  the  im- 
put  in  a  SIMO  context.  In  the  proposed  criterion, 
the  channel  estimate  and  a  linear  equalizer  are  inde¬ 
pendant  variables  to  be  adjusted  so  as  to  match  the 
observed  signal  in  a  least  mean  square  sense.  The 
minimization  of  this  criterion  asymptotically  leads  the 
“true”  (channel,equalizer)  pair  in  the  absence  of  noise, 
up  to  some  scalar  constant,  and  is  robust  to  possi¬ 
ble  model  order  errors  thanks  to  the  particular  struc¬ 
ture  chosen  for  the  equalizer.  A  gradient  descent  im¬ 
plementation  was  proposed,  providing  however  rather 
slow  convergence.  Further  work  will  include  the  study 
of  implementations  of  the  recursive  least-squares  type, 
for  example  by  alternating  RLS  algorithms  on  the 
equalizer  and  the  channels  estimates. 
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Figure  1:  The  joint  channels/equalizer  estima¬ 
tion  setup. 


Figure  2:  Output  mean  square  error  using  the 
linear  equalizer  lj  by  the  gradient  algorithm  in 
5.  A  =  0.02.  ’a’  curve:  K  =  4,  ’b’  curve:  K  =  S 


Figure  3:  Identification  by  the  channel  estimate 
h  by  the  gradient  algorithm  in  5.  A  =  0.02.  ’a’ 
curve:  K=4,  ’b’  curve:  K=8 
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Abstract 

This  contribution  adresses  the  problem  of  optimal 
blind  linear  symbol  recovery,  using  the  channel  diver¬ 
sity  induced  by  a  sensor  array  or  time  oversampling. 
We  present  a  technique  allowing  the  computation  of  a 
minimum  mean-square  error  (MMSE)  equalizer,  based 
on  the  optimization  of  quadratic  second-order  func¬ 
tions,  The  proposed  technique  improves  on  existing 
multichannel  equalization  methods,  in  that  previous 
methods  generally  build  on  criteria  which  are  optimal 
in  the  sole  noise  free  context.  Our  criterion  also  allows 
free  choice  of  the  delay  for  the  symbol  recovery.  As  a 
consequence,  MMSE  equalization  performance  can  be 
enhanced  through  the  use  of  an  optimal  delay.  To  this 
end,  a  performance  analysis  is  conducted  in  order  to 
investigate  some  of  the  links  between  the  delay  and  the 
symbol  estimation  accuracy. 

1  Introduction 

Blind  multichannel  deconvolution  exploiting  the 
channel  diversity  induced  by  sensor  arrays  and/or  time 
oversampling  has  attracted  a  lot  of  research  efforts  in 
the  recent  years.  Methods  can  now  be  found  in  the  lit¬ 
erature,  based  on  the  minimization  of  various  second- 
order  criteria,  which  offer  promising  alternatives  to 
the  previously  reported  higher-order  based  techniques. 
These  methods  are  basically  multichannel  batch  de- 
convolution  techniques  in  which  transmitted  sequence 
or  channel  coefficients  are  recovered  through  subspace 
decompositions  of  either  the  received  data  matrix 
(see  the  so-called  deterministic  methods  [3,5,4])  or 
the  received  data  correlation  matrix  (see  the  stochas¬ 
tic  methods  [1,2]),  while  other  interesting  approaches 
were  also  studied  in  [7,8,9,12].  Some  methods,  per¬ 
forming  a  channel  pre-identification,  have  to  be  linked 
to  an  extra  equalization  stage,  thus  increasing  the 
global  cost  of  the  reception  scheme.  In  the  (nu¬ 
merous)  communications  applications  where  low  com¬ 
plexity  and/or  tracking  ability  is  sought,  direct  on¬ 
line  equalization  techniques  requiring  no  channel  pre¬ 
identification  are  to  be  favored.  Significant  improve¬ 
ments  can  also  be  gained  in  this  context  from  chan¬ 
nel  diversity.  These  improvements  concern  two  main 
points  :  (i)  convergence  reliability  (diversity  allows 


the  use  of  second-order  unimodal  error  function  for 
equalization),  and  (ii)  equalization  accuracy,  using  a 
simple  linear  structure  since  finite-length  zero-forcing 
equalizers  are  available  in  the  multichannel  context. 
However,  these  existing  direct  equalization  methods 
generally  suffer  from  a  lack  of  robustness  in  the  sense 
that  they  build  on  criteria  which  are  optimal  in  the 
noise  free  context,  but  not  (or  even  far  from)  opti¬ 
mal  in  the  practical  noisy  situations.  This  typically 
includes  the  prediction-based  methods  [7,8]  but  also, 
in  a  lower  extent,  the  mutually  referenced  equalizers 
(MRE)  method  in  [9].  As  another  lack  of  optimal¬ 
ity,  most  existing  on-line  multichannel  techniques  ([9] 
being  an  exception)  are  unable  to  exploit  the  perfor¬ 
mance  gain  that  stems  from  the  choice  of  a  proper 
delay  for  symbol  recovery  [7,10].  This  paper  investi¬ 
gates  the  solutions  to  these  problems.  Our  contribu¬ 
tion  is  two-fold:  (1)  we  present  a  technique,  based 
on  a  modification  of  the  criterion  initially  introduced 
in  [9],  allowing  the  derivation  of  a  blind  linear  multi¬ 
channel  equalizer,  optimal  in  the  MMSE  sense,  (2)  By 
this  approach,  we  show  that  we  may  improve  on  the 
robustness  of  the  obtained  equalizer  through  a  proper 
tuning  of  the  reconstruction  delay  :  The  criterion  is 
optimal  in  the  MMSE  sense  for  a  given  reconstruction 
delay,  which  can  be  chosen  so  that  the  corresponding 
MMSE  is  minimal  among  all  possible  delays.  Finally, 
a  theoretical  study  is  conducted,  based  on  the  compu¬ 
tation  of  Cramer- Rao  bounds,  that  provides  a  guide¬ 
line  for  the  choice  of  the  optimal  delay.  Notations: 
E{)  statistical  expectation;  ()*  complex  conjugation; 
0*  transposition;  O”^  trans-conjugation;  |  .  [  T2-norm 
of  a  complex  scalar,  vector  or  matrix. 

2  Problem  statement 

The  multichannel  model  of  a  digital  communication 
system  relies  on  the  existence  of  L  different  channels, 
modeled  by  finite  degree  linear  digital  filters  , 

driven  by  the  same  PAM/QAM  sequence  sjfc.  Their  L 
noisy  outputs  are  observed: 

M 

+  K  for  i  =  I..L  (1) 

fc  =  0 
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We  assume  throughout  the  paper  that  the  se¬ 
quences  Sk  and  bi^,i  =  I..L,  are  mutually  uncor¬ 
related.  6j,,i  =  i..L  being  white  processes,  E  \ 
|2=  oTj.  Note  that  the  emitted  sequence  Sk  is 
not  required  to  be  white,  in  contrast  with  [7,8,10]. 
In  a  practical  context,  the  h”s  either  represent  the 
baud-rate  sampled  versions  of  a  single  physical  chan¬ 
nel  with  various  sampling  phases  (time  diversity),  or 
the  channel  linking  the  transmitter  and  the  i*'*  sen¬ 
sor  on  an  antenna  array  (space  diversity),  see  [1,2] 
for  further  details.  Here,  M  denotes  the  ISI  length. 
Consider  the  LN  x  1  space-time  signal  and  noise 
processes  with  time  window  of  size  N  defined  by 

Xn  =  “ 

•  Let/ =  M -f  JV  de¬ 
note  the  number  of  symbols  involved  in  the  expression 
of  X„ .  The  following  linear  model  holds 

X„  =  nSn  +  Bn  (2) 

where  Sn  —  [sm  •••)  s(n— P-fl)]*  and  7i  is  the  so-called 


LN  X  P  Sylvester  matrix,  defined  by 

(  ^0  ■ 

0  ••• 

0 

\ 

0 

0  • 

••  0 

hh  ••• 

n  = 

\ 

hi  ■ 

0  ••• 

0 

0 

0  • 

0 

/ 

The  identifiability/equalizability  conditions  of  the  sin¬ 
gle  input/multiple  output  system  above  are  estab¬ 
lished  in  [1,6],  and  can  be  restated  as 

•  (HI)  H  has  full  column  rank  {LN  >  P). 

3  Blind  MMSE  Equalization 

In  the  following,  it  is  shown  how  noise  free  adap¬ 
tive  linear  equalizers,  i.e.  vectors  satisfying  the  ZF 
condition,  can  first  be  obtained  using  the  method  of 
mutually  referenced  filters,  and  then  be  exploited  to 
train  a  blind  MMSE  equalizer. 

3.1  Derivation  of  adaptive  ZF  equalizers 

It  is  seen  from  (2)  that  a  LTV  x  1  vector  (denoted 
Vj ) ,  given  by  any  line  of  any  left-inverse  matrix  of  7i 
will  satisfy  the  ZF  condition  up  to  a  delay  between  0 
and  P-  1,  i.e.  lA+7f  =  [0,  ..,0, 1,0,  ..,0].  The  MRE 
criterion  introduced  in  [9],  provides  a  useful  means  to 
compute  blindly  and  adaptively  such  equalizers,  in  the 
noise  free  case.  The  main  result  of  [9]  can  be  restated 
as  follows; 

Lemma  3.1  Consider  a  set  of  P  linear  equalizers 
Vo,Vi,...,Vp-i  and  adjust  them  so  as  to  cancel  the 


quadratic  function 

JmreiVo,  Fp-l)  =  E  |  Vj^Xn  -  V+,Xn+r 
i=o 

under  constraint 

E II  ii'=  1 

j=o 

Then  Vj,  for  j  =  0..P  -  1,  is  a  “j-delay”  exact  ZF 
equalizer  ,  i.e.  Vj^ Xn  =  aSn-j,  where  a  is  an  arbi- 
trary  constant  scalar. 

In  this  result,  the  information  redundancy  provided 
by  several  independant  equalizers  associated  with  dif¬ 
ferent  delays  is  exploited  to  build  a  second-order  cri¬ 
terion.  However,  in  the  noisy  situations,  the  minimiz- 
ers  of  the  MRE  criterion  are  no  longer  optimal  and 
the  obtained  filters  {V}}  are  biased.  Fortunately,  the 
the  MRE  criterion  can  be  modified  so  that  the  new 
performance  surface  gets  noise  independant.  The  pro¬ 
posed  bias  removal  technique,  inspired  from  [11],  con¬ 
sists  in  replacing  the  former  unit-norm  constraint  by  a 
new  quadratic  constraint  that  incorporates  the  knowl¬ 
edge  of  the  noise  covariance  matrix  structure.  The 
technique  is  as  follows:  Let  V  =  [V^ , 
he  the  LN P  X  I  vector  consisting  of  all  the  equal¬ 
izers  entries.  Jmre  can  be  rewritten  in  a  compact 
matrix  form  as  JmreiV)  =  V^UV  where  7^  is  a 
sparse  LNP  x  LNP  matrix  made  from  sub-blocks 
E{Xn+iX^)  and  E{XnX^).  Under  the  white  noise 
and  noise/signal  decorrelation  assumptions,  TZ  splits 
into  a  noise  and  a  signal  part  as: 

n^lls-h  crllli,  (3) 

where  IZ^  and  IZi,  are  the  matrix  forms  of  the  crite¬ 
rion  Jmre  f^e  signal-only  case  and  noise-only  case 
respectively.  Closed-form  expression  of  IZy  TZgy  and 
TZh  are  provided  in  appendix  A.  Note  that  IZg  has  a 
non  trivial  nullspace  since  Jmre  can  be  cancelled  in 
the  absence  of  noise  while  IZh  has  full  rank. 

Lemma  3.2  Let  ® 

LNPx  1  vector  minimizing  JmreiV)  =  V+1ZV ,  Under 
constraint  V^UtV  =  1.  Then,  for  each  j,  X„  = 

as„-j  +  Bn,  i.e  is  an  unbiased  j-delay  ZF 
equalizer. 

Proof  It  is  easily  shown  using  Lagrange  multipliers 
that  satisfies: 

-  {V°’’^'*'nV’’P^)1lbV‘’P*  =  0  (4) 

with  taking  the  minimal  possible  value. 

From  (3),  the  only  solutions  to  this  problem  are  such 
that  =  0  which  bring  us  back  to  the  noise  free 

solutions.  Note  that  this  technique  do  not  require  the 
knowledge  of  Cj.  O 
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3.2  Coupling  with  a  MMSE  equalizer 

Since  the  newly  constrained  mutually  referenced 
equalizers  asymptotically  provide  ISI  free  sym¬ 

bols  estimates,  any  of  these  can  be  used  to  train  a 
MMSE  equalizer  as  in  a  fully  supervised  context:  Let 
the  LN  X  1  adaptive  filter  be  obtained  through 
the  minimization  of 

Jmse{W)  =  E  I  W+X„  -  ^  (5) 

a  rf-deiay  MMSE  equalizer,  provided  the  noise 
contributions  in  X^  and  Xn-d-\-j  are  decorrelated,  so 
that  the  correct  (unbiased)  optimum  is  attained  for 
Since  the  noise  is  assumed  to  be  temporally 
and  spatially  white,  this  condition  is  met  if  d—j  >  N. 
In  the  following,  we  provide  theoretical  hints  enlight¬ 
ening  the  links  between  the  delay  and  the  steady-state 
performances  of  A  general  guideline  is  provided 
for  the  choice  of  d. 


4  On  the  Delay  for  Symbol  Recovery 


In  this  section,  we  investigate  analytically  the  links 
between  the  chosen  delay  and  the  minimum  achiev¬ 
able  output  MSE,  regardless  of  the  equalization  cri¬ 
terion.  The  study  is  conducted  by  means  of  Cramer- 
Rao  bounds.  We  essentially  make  the  assumption  that 
all  transmitted  symbols  but  one,  within  a  temporal 
window  of  size  N,  are  known  deterministic  quantities. 
Though  this  assumption  may  appear  unrealistic,  it  not 
only  greatly  simplifies  the  mathematical  developments 
but  also  permits  the  derivation  of  tractable  and  inter¬ 
pretable  equations.  We  assume  further  that  the  noise 
is  white  gaussian.  Then,  at  time  (fixed)  n,  the  p.d.f. 
of  a  space- time  vector  writes: 


fiXn) 

dlogjfjXn)) 

dsji^d 

d  dlog{f{Xn)) 
ds*„_d  dSn-d 


Kexpi - 2  |X„-W5„  p) 


r(X,,-nSn)+hd+i 


I  hd+i 


where  denotes  the  (d+  1)*^  column  of  the  chan¬ 
nel  convolution  matrix  Ti.  Now  we  get  the  classical 
estimation  theory  result: 

Lemma  4.3  Let  7}(X„)  be  any  unbiased  estimator 
(possibly  non-linear)  of  Sn-d,  the  other  symbols  being 
seen  as  known  deterministic  variables.  Then,  we  have 

E  I  7i(X„)  -  p>  CRB(s„_d)  =  (6) 

II  “rf+i  H 

Due  to  the  unrealistic  assumptions  made  here,  the  in¬ 
dicated  bounds  have  very  limited  practical  applica¬ 
bility.  For  instance  the  obtained  cramk-Rao  bound 
does  not  account  for  the  possible  lack  of  channel  dis¬ 
parity  in  7i  which  may  cause  some  severe  degrada¬ 
tion  in  the  estimation  performance.  However,  for 


a  relatively  “well  conditionned”  matrix  Ti,  the  re¬ 
sults  m  (6)  provide  some  insight  into  the  shape  of 
the  distribution  of  performances  versus  the  delay  d: 
Due  to  the  Sylvester  structure  in  7f,  its  columns  are 
such  that  CRB{sn)  >  ...  >  CRB{sn^M)  =  ...  = 
CRB{Sn-N+l)  <  ...  <  CRB{Sn^p^i),  as  long  as 
N  >  M .  This  is  a  channel-independant  result  giving  a 
simple  guideline  which  consists  in  favoring  the  delays 
close  to  or,  if  possible,  between  d  —  M  and  d^  N 
symbol  durations  in  order  to  improve  the  estimation 
accuracy.  The  fact  that  extreme  values  for  the  delay 
(typically  d=0ord=:P  —  1),  provide  the  worst 
noise  enhancement  properties  was  also  confirmed  in 
our  simulations. 

5  Implementation  and  Simulations 

A  possible  implementation  of  the  blind  MMSE 
equalization  technique  presented  in  3  is  based  on  a 
straightforward  coupled  stochastic  gradient  descent  of 
the  criteria  Jmre,  under  the  bias  removal  constraint, 
and  Jmse*  The  ZF  equalizers  estimates  Vb,...,Vp_i  are 
updated  according  to  the  constrained  cost  Jmre,  the 
output  of  one  of  them  V^*,  is  selected  as  a  reference 
signal,  the  MMSE  equalizer  estimate  W  being  up¬ 
dated  according  to  the  steepest  descent  of  Jmse^  Due 
to  the  lack  of  space  the  implementation  details  are 
omitted  here.  A  short  validation  of  such  a  technique 
is  presented  in  the  following  communication  context: 
L  =  2  randomly  chosen  channels,  with  degree  M  =  A 
(the  channel  coefficients  are  those  given  in  [12]).  The 
output  SNR  is  lOdB  and  the  symbols  are  QPSK- 
modulated.  We  take  iV  =  5  as  the  number  of  snap¬ 
shots  considered  altogether.  Delay  optimization  to¬ 
gether  with  the  condition  d  —  j  >  N  naturally  sug¬ 
gests  i  =  0  for  ZF  equalization  delay,  and  d  =  5  for 
the  MMSE  equalization  delay  (though,  strictly  speak¬ 
ing,  a  4-delay  MMSE  equalizer  would  provide  better 
results). 

In  fig.  1,  we  plot  a  typical  learning  curve  for  the 
output  MSE  between  the  transmitted  and  equalized 
data,  using  the  estimated  5-delay  MMSE  equalizer, 
versus  the  iteration  number.  To  check  the  result,  the 
asymptotic  MSE  achieved  by  the  true  5-delay  MMSE 
equalizer  is  indicated  in  dashed  line.  The  excess  MSE 
provided  by  the  adaptive  equalizer  is  due  to  the  adap¬ 
tation  noise  and  can  be  reduced  by  decreasing  the 
stepsize  in  the  gradient  algorithms  . 

In  fig.  2,  we  check  the  relevance  of  the  theoretical 
results  in  section  3.  The  Cramer-Rao  bounds  provided 
by  expression  (6)  are  plotted  for  different  delays,  using 
symbols.  For  comparison,  the  minimum  achiev¬ 
able  MSE  with  a  linear  equalizer  is  indicated,  in  ’o’. 
As  expected,  the  lower  bounds  provided  by  the  sim¬ 
ple  performance  analysis  in  3  are  rather  optimistic. 
However,  the  correspondance  indicated  between  delay 
and  performance  is  roughly  verified,  even  assuming  a 
sub-optimal  linear  structure  for  equalization. 

6  Conclusion 

In  this  contribution,  it  was  shown  that  asymptot¬ 
ically  ideal  ZF  adaptive  equalizers  could  be  obtained 
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through  the  minimization  of  the  MRE  criterion  and 
used  to  train  blindly  an  adaptive  MMSE  equalizer. 
In  the  presence  of  additive  noise,  the  MRE  criterion 
needs  be  modified  to  allow  the  derivation  of  unbiased 
equalizers.  In  the  proposed  algorithm,  a  non-zero  de¬ 
lay  can  be  chosen  for  the  MMSE  equalizer.  A  simpli¬ 
fied  Cramer- Rao  bounds  analysis  was  used  as  a  means 
to  give  practical  guidelines  for  the  choice  of  an  optimal 
delay.  The  numerical  simulations  results  match  rather 
well  with  the  theoretical  derivations. 
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Appendix  A 

Rx  =  E(Xr,X+),  Rix  =  E{X„+iX+).  Define  A 
as  the  P  X  P  diagonal  matrix  diag{l^  2, 2, 2, 1),  and 
Jp  as  the  P  X  P  matrix  with  ones  on  the  diagonal 
above  the  main  one,  and  zeroes  elsewhere.  Then,  it  is 
rather  straightforward  that 


where  0  is  the  conventional  Kronecker  product.  We 
have  in  a  similar  way: 

^  ®  “  Jp  ®  P'lB  ~  Jp  ® 


where  B  refers  to  the  normalized  (unit- variance)  white 
noise.  Ip  is  the  identity  matrix  of  size  P.  Let 
Yn  =  TiSn  (noise  free  observation).  The  signal- related 
matrix  writes: 


TZ,  =  A®  Ry  -  Jp  <S>  R.1Y  -  Jp  ®  Riy 


algorithm  versus  the  iteration  number. 


Figure  2:  ’o’  Minimum  achievable  MSE  by  a  lin¬ 
ear  equalizer  in  dB,  versus  the  delay.  ’+’  Com¬ 
parison  with  the  derived  CR  bounds 


72.  =  A  0  Rx  —  Jp  ®  Rix  ^  Jp  ®  R\x 
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ABSTRACT 

Hidden  Markov  Models  (HMMs)  are  employed  in  this  pa¬ 
per  to  describe  digital  communication  channels,  and  their 
parameters  are  estimated  in  a  blind  f£ishion.  General  non¬ 
linear  channels  can  be  accomodated  which  are  not  restricted 
to  be  of  the  Volterra  type.  Contrary  to  standard  HMM 
parameter  estimation  techniques,  whdch  resort  to  nonlin¬ 
ear  optimization  of  the  likelihood  function,  the  proposed 
method  is  based  on  a  graph  theoretic  approach.  We  ex¬ 
ploit  the  De  Bruijn  property  of  the  channel’s  state  trcuisition 
graph,  and  develop  computationally  efficient  blind  estima¬ 
tion  procedures  involving  shortest  path  searches.  We  show 
identifiability  of  the  associated  graph  problem  and  discuss 
convergence  issues.  Finally,  some  illustrative  simulations 
are  presented. 


1.  INTRODUCTION 

Most  of  the  existing  literature  on  channel  estimation  and 
equalization  has  focused  on  linear  channels  which  can  be 
described  by  an  impulse  response  of  finite  length.  However, 
in  some  applications  the  linearity  assumption  may  not  be 
valid,  mainly  due  to  nonlinear  amplifiers  in  the  transmitters 
(or  repeaters),  as  for  example  in  satellite  chcumels  [3],  [5], 
[6]. 

Following  the  linear  channel  paradigm,  a  common 
method  of  describing  nonlinear  channels  is  by  using  trun¬ 
cated  Volterra  models  [1].  Although  Volterra  series  provide 
a  genercJ  framework  for  treating  nonline£ir  systems,  they 
may  not  be  perfectly  suited  for  communication  channels,  as 
they  do  not  take  into  accoimt  the  finite  alphabet  nature  of 
the  input.  Moreover,  there  is  no  clear  indication  of  what 
the  minimum  Volterra  order  is,  that  would  provide  a  satis¬ 
factory  approximation. 

In  this  paper  we  regard  the  chcinnel  as  a  general  nonlin¬ 
ear  mapping  with  no  pctrticular  parametrizable  form.  The 
instrumentcil  observation  however,  is  that  the  input  (and 
hence  the  channel  state)  can  take  a  finite  number  of  different 
values.  Thus,  the  ch^mnel  estimation  problem  is  equivalent 
to  identifying  the  mapping  from  each  state  to  the  corre¬ 
sponding  channel  output. 

This  approach  to  the  channel  estimation  problem  natu- 
rcJly  leads  to  the  theory  of  finite  state  machines  and  HMMs. 
In  fact,  general  maximum  likelihood  (ML)  techniques  for 
blindly  estimating  the  parameters  of  HMMs  are  well  known 
[8],  and  have  been  applied  in  the  context  of  communication 
channels  [2],  [4],  [10].  These  approaches  however,  siiffer 
from  increased  computational  complexity,  and  convergence 
problems  related  to  the  local  minima  of  the  likelihood  func¬ 
tion  [8]. 


In  this  paper,  we  employ  graph  theoretic  techniques  in 
connection  with  clustering  methods  to  avoid  the  likelihood 
maximization  procedure.  The  proposed  method  is  compu¬ 
tationally  efficient  and  a  unique  solution  is  guaranteed  un¬ 
der  some  identifiability  conditions. 

2.  PROBLEM  STATEMENT 

Let  the  received  data  j/(n),  n  =  0, . . . ,  W  —  1  be  generated 
by  the  communication  system  shown  in  Fig.  1,  i.e., 

y(n)  =  h[vf{n)]  +  v{n)  (1) 

where  w(n)=[u;(n),  w(n  -  1), ,  w(n  -  g)]”^  and  the  trans- 
mitted  sequence  w;(n)  consists  of  i.i.d.,  equiprobable  num¬ 
bers  tcddng  values  from  a  finite  alphabet  set  A  = 
{oi ,  02 , . . .  Ua}  of  size  a,  h[-]  is  a  linear  or  nonlinear  channel 
of  memory  order  q  cuid  i;(n)  is  zero  mean,  white,  additive, 
Gaussian  noise. 

The  channel  /»[•]  does  not  have  to  obey  a  certain  para¬ 
metric  form;  however,  it  is  not  allowed  to  map  distinct  state 
vectors  to  identical  outputs,  as  formalized  in  the  following 
£issumption: 

(ASl)  For  every  Wi  #  W2,  the  channel  response  is  /iFwi]  ^ 
h[W2]. 

Under  this  cissumption,  the  gocil  of  this  paper  is  to  iden¬ 
tify  the  channel  mapping  h[w]  for  every  possible  state 
w  €  A^'^  ^ .  Once  the  identification  step  is  completed,  a  ML 
input  estimation  procedure  can  be  used  (Viterbi  algorithm) 
to  recover  the  input. 

The  proposed  identification  procedure  consists  of  two 
steps:  A  clustering  algorithm  is  employed  first  to  estimate 
the  different  values  the  (noiseless)  channel  output  h[-] 
Ccin  take.  Then,  graph- theoretic  techniques  are  developed 
to  associate  each  of  the  cluster  centers  with  the  appropriate 
state  vector  w  G  A^"^^ . 

It  should  be  pointed  out  at  this  point  that  the  channel 
is  uniquely  identifiable  only  up  to  a  permutation  of  the  in¬ 
put  alphabet  values.  For  example,  in  the  BPSK  Ccise  A  = 
{— 1,+1}  which  is  indistinguishable  from  A  = 
with  appropriately  permuted  response  /2[-].  In  linear  chan¬ 
nels,  this  inherent  ambiguity  m2inifests  itself  as  a  seeding 
ambiguity. 

3.  CLUSTERING 

Clustering  techniques  have  been  employed  in  descriptions 
of  communication  channels  when  training  data  are  avail¬ 
able  [9].  They  are  used  to  provide  the  chcmnel’s  (noiseless) 
response,  cissociated  with  each  HMM  state..  In  the  case  of 
negligible  additive  noise,  the  clustering  step  becomes  trivial 
and  can  be  solved  by  inspection  of  the  received  data  f/(n). 
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Figure  1.  Nonlinear  Commimication  Channel 

The  most  commonly  used  clustering  method  is  the  K- 
means  algorithm  (e.g.,  [11]),  which  adjusts  the  cluster  cen¬ 
troids  hkj  k  =  1, . . .  and  associates  each  data  point 

y{n)  with  a  cluster  membership  set  /jt,  such  that  the  fol¬ 
lowing  minimization  problem  is  solved: 

min  E  E  |y(n)-hfc|^.  (2) 

fc=l  y[n)elk 

The  algorithm  proceeds  in  an  alternating  fashion,  optimiz¬ 
ing  in  tiuTi  the  centroids  hk  and  the  cluster  assignments 

4. 

4.  IDENTIFYING  THE  MARKOV  MODEL 

Once  the  centroids  and  cluster  assignments  have  been  de¬ 
termined  by  the  K-means  algorithm,  the  state  transition 
probabilities  need  to  be  estimated  to  complete  the  Mcirkov 
Model  description  of  the  channel.  For  general  HMMs,  the 
transition  probabilities  are  typically  estimated  by  maximiz¬ 
ing  the  likelihood  of  the  observed  sequence  [8].  In  the  cur¬ 
rent  problem  however,  we  should  exploit  the  a  priori  infor¬ 
mation  that  is  available  about  the  channel  ctnd  avoid  the 
costly  maximization  step.  For  every  channel  state  there  are 
only  a  possible  transitions,  since  the  new  input  -f  1) 
can  only  take  a  values.  Moreover,  these  transitions  are 
equiprobable. 

Let  /(y(n))  €  {1, . . . ,  be  the  cluster  membership 

function  obtained  from  the  K-means  algorithm  (i.e.,  the 
nearest  neighbor  cluster  assignment  for  each  data  point 
y{n)).  We  propose  to  replace  the  sequence  y(n)  by  /(t/(n)) 
and  identify  the  transitions  from  a  state  ko  by  recording  all 
the  trcinsitions  form  ko  in  that  sequence.  In  the  absence  of 
noise,  only  the  allowable  transitions  will  be  present,  while 
in  the  noisy  case,  the  a  most  frequently  recorded  transitions 
will  correspond  to  the  ciUowable  ones.  Indeed,  if  the  clusters 
are  separated  so  that  the  probability  of  misclassification  in 
/(y(n))  is  small  (see  also  (ASl)),  then  the  a  most  frequently 
recorded  transitions  should  be  distinguishable  from  the  oc¬ 
casional  spiuious  transitions.  This  simple  procedure  seems 
able  to  complete  the  graph  describing  the  HMM  in  a  com- 
putationciQy  efficient  way.  However,  it  cannot  provide  the 
association  between  each  state  k  £  {1, . . . ,  and  the 

corresponding  a-ary  vector  Wfc  E  In  other  words, 

cJthough  the  noiseless  channel  outputs  hk  have  been  esti¬ 
mated  for  every  state  A;,  and  the  trcinsition  graph  hcis  been 
completed,  the  chfinnel  mapping  h[wfc]  has  not  been  identi¬ 
fied  yet,  because  the  correspondence  k  v/k  hcis  not  been 
obtained.  This  association  is  crucial  in  using  the  HMM  to 
decode  the  input  and  needs  to  be  recovered. 

The  problem  Ccin  be  equivalently  posed  as  labeling  each 
state  k  E  {1, . . .  using  a  g -j-  1  length  vector 

Wife  =  [wki,wk2,---y^kg^i]  of  a-ary  symbols  Wki  €  A, 
i  =  1, . . .  g  -f  1,  in  a  way  tnat  is  consistent  with  the  channel 


operatrions.  The  main  contribution  of  our  work  is  in  em¬ 
ploying  graph-theoretic  tools  and  developing  an  algorithm 
to  solve  this  association  problem. 

5.  IDENTIFIABILITY 

It  is  clear  that  for  a  general  finite  state  model,  there  is 
a  large  number  of  different  labelings  possible,  and  fimther 
information  is  required  to  complete  that  task.  HMMs  de¬ 
scribing  communication  channels  however,  are  of  a  specicd 
form  and  admit  a  unique  labeling  (under  certcdn  inher¬ 
ent  ambiguities)  as  shown  next.  The  key  observation  is 
that  for  a  communication  channel  graph,  a  state  transi¬ 
tion  from  Wfc  =  [wkiyWk^y  •  • ,  to  is  valid  only 

if  y/i  ==  w  £  A.  In  other  words,  the 

channel  acts  cis  a  shift  register,  at  each  transition  shifting 
Wk2i  •  •  • » incorporating  the  new  data  point  w. 

Graphs  describing  such  systems  are  called  De  Bruijn 
graphs  [7],  ctnd  have  been  extensively  studied  in  the  area 
of  computer  science.  They  find  applications  in  many  di¬ 
verse  problems  from  coding  theory  to  routing  in  computer 
networks.  However,  to  the  best  of  our  knowledge,  they  have 
not  been  studied  from  ^ln  identifiability /labeling  viewpoint. 
The  main  identifiability  result  developed  in  this  paper  is 
summcirized  in  the  following  proposition. 

Proposition:  Every  De  Bruijn  graph  admits  a  xmique  la¬ 
beling  of  its  states,  within  a  permutation  of  the  alphabet 
letters.  O 

The  proof  is  constructive  and  actu^Jly  provides  an  algo¬ 
rithm  to  implement  the  association  (labeling)  procedure. 
Before  developing  the  proof,  it  will  be  useful  to  note  that 
every  De  Bruijn  graph  has  exactly  a  nodes  with  self-loops. 
These  correspond  to  the  states  Wa^  =  [«^ai,  ^ai, . .  • ,  w^aj, 

Waj  =  [Wo2,  Waj,...  Wa„  =  [t«o„  ,  Wa„  ,  •  •  •  ,  Wa„], 

and  can  be  identified  as  the  a  non-zero  entries  in  the  diag¬ 
onal  of  the  state  transition  matrix.  Since  permutations  of 
those  a  labels  simply  corresponds  to  permutations  of  the  in¬ 
put  alphabet  symbols,  we  can  cissume  without  loss  of  gener¬ 
ality  that  Wai ,  Wo2 , . . . ,  Wo„  are  given;  we  call  those  nodes 
the  roots  of  the  graph.  Also,  for  every  node  Wfc,  we  call  the 
nodes  that  are  accessible  with  one  transition,  the  children 
of  Wfc.  With  this  terminology  established,  the  proposition’s 
proof  is  based  on  the  following  lemma. 

Lemma:  If  the  following  information  is  given  about  a  De 
Bruijn  graph: 

i)  the  roots’  labels, 

ii)  the  label  of  an  arbitrary  node, 

then  the  labels  of  that  node’s  children  are  unique.  □ 

Proof:  Consider  an  arbitrary  node  Wfc  =  [wk^ ,  Wk2  j  •  •  •  j 
Wkq^i]y  and  one  of  its  children  wi  =  [tyfc2>  •  ♦  •  ? «^]- 
Let  us  cissume  that  there  are  two  different  valid  labelings 
for  Wi,  one  corresponding  (without  loss  of  generality)  to 
w)  =  Qi,  and  one  to  w  /  ai.  Let  us  compute  the  shortest 
path  from  v/i  to  the  roots  Wo^ ,  Waa,  *  ‘  ,  Wa^  (i.e.,  the  min¬ 
imum  number  of  transitions  or  shifts  required).  If  =  ai, 
then  clearly  at  most  q  shifts  are  needed  to  arrive  at  Waj, 
while  g  -f  1  shifts  are  needed  to  eirrive  at  any  of  the  other 
roots  Wo2 ,  * . . ,  Woo .  If  however  w  ^  oi ,  g  -f  1  shifts  are 
needed  to  arrive  at  Woi  which  is  a  contraiction.  Hence, 
both  labelings  cannot  be  valid.  □ 

The  proof  of  the  lemma  indicates  a  labeling  procedure; 
namely  starting  by  arbitrarily  labeling  the  roots  Wai, 
Wa2 , . . . ,  Wao ,  and  proceeding  by  systematically  visiting  the 
remmning  nodes  and  labeling  them  according  to  the  short¬ 
est  path  test.  The  details  of  the  algorithm  are  explained 
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next. 


6.  PROPOSED  ALGORITHM 

The  proposed  algorithm  consists  of  four  steps,  namely  clus¬ 
tering,  transition  matrix  estimation,  labeling  and  decoding: 

Step  1:  (Clustering)  Obtain  the  cluster  centers  hk 
cmd  cluster  membership  function  /(y(n))  from  (2),  using 
the  K- means  algorithm. 

Step  2;  (Transition  matrix)  For  every  state  ko,  obtain  its 
children  as  the  a  most  frequent  transitions  from  ko  in  the 
sequence  /(y(n)). 

Step  3:  (Labeling) 

Initialization: 

(i)  Mark  all  nodes  as  not.visited 

(ii)  Find  the  a  non-zero  entries  in  the  diagonal  of 
the  transition  matrix  (roots),  and  label  them  ar¬ 
bitrarily  as  Waj  ,  Wo2  ,  .  .  .  ,  Wa^  . 

(iii)  Put  the  roots  in  the  to.visit  queue. 

Recursion: 

while  to_visit  queue  is  non-empty, 

(i)  Get  first  node  out  of  the  queue 

(ii)  If  not -visited  then 

(iia)  Get  its  a  children 

(iib)  Put  them  in  the  to-visit  queue 

(iic)  Mark  the  node  as  visited 

(iid)  Compute  the  shortest  path  between  each 
child  and  the  a  roots. 

(iie)  Label  each  child  using  the  root  corresponding 
to  the  smallest  shortest  path  value. 

Step  4:  Employ  the  Viterbi  algorithm  to  recover  the  trans¬ 
mitted  symbols. 

7.  DISCUSSION 

Some  remcirks  on  the  proposed  algorithm  are  now  in  order: 

1.  There  are  many  computationally  efficient  algorithms 
to  compute  the  shortest  path  from  one  node  to  every  other 
node  in  the  graph  (e.g.,  Dijkstra’s  algorithm  [7]).  Their 
complexity  is  on  the  order  of  QlogQ,  where  Q  = 

Hence,  the  complexity  of  the  proposed  labeling  scheme  is 
ocQ^logQ. 

2.  The  proposed  method  is  subop timal  when  compared 
with  the  maximum  likelihood  (ML)  solution  provided  by 
the  Baum- Welch  algorithm.  However,  its  computation^ 
requirements  and  convergence  properties  may  be  better. 
In  medium  to  high  SNR,  clustering  methods  are  known  to 
perform  satisfactorily,  and  may  suffer  less  than  ML  meth¬ 
ods  from  local  minima  problems.  If  further  computational 
power  is  available,  the  results  from  the  proposed  method 
can  be  used  as  initial  conditions  for  the  Baum- Welch  algo¬ 
rithm,  to  improve  the  estimation  accuracy. 

3.  As  a  final  remark,  it  should  be  noted  that  the  ML 
methods  suffer  from  the  same  identifiability  problem  (i.e., 
ambiguity  with  respect  to  permutations  of  the  alphabet 
symbols),  which  is  inherent  to  the  blind  problem  at  hand. 

8.  SIMULATIONS 

In  this  section,  some  simulations  results  are  presented  in  or¬ 
der  to  evaluate  the  performance  of  the  proposed  method.  In 
all  the  simulations,  the  data  were  generated  by  filtering  the 
transmitted  sequence  (N=800),  through  a  linear  channel  of 
order  g  =  2  with  H{z)  =  l  +  (l+0.5z>““^-h(l--'0.5i)^’“^,  and 
passing  the  output  through  a  memoryless  norJinecir  model 
of  a  ‘travelling  wave  tube  amplifier’  [6]  which  is  employed  in 


satellite  commumcations.  The  model  parameters  used  were 
the  ones  proposed  in  [6]  (see  Fig.  5  in  [6]). 

The  transmitted  and  received  BPSK  symbols  for  the  non¬ 
linear  channel  are  plotted  respectively  in  Fig.  2a  and  2b. 
Additive  Gaussian  noise  of  SNR  =  12  dB  was  added.  The 
output  of  the  clustering  algorithm  is  shown  in  Fig.  2c, 
where  each  data  point  is  assigned  to  one  of  eight  possible 
clusters  (Original  cluster  centers.  Estimated  cluster  centers 
and  received  data  are  depicted  by  o,x,-  respectively).  Fi¬ 
nally,  in  Fig.  2d  the  output  of  the  Viterbi  algorithm  is 
shown  where  only  one  error  has  occurred.  Fig.  3  also  de¬ 
picts  the  performance  of  the  Viterbi  decoder  by  showing 
the  origin^  and  estimated  state  sequence  (Fig.  3a  and  3b) 
as  well  as  the  state  error  sequence  (Fig.  3c).  Only  one 
excursion  from  the  correct  state  sequence  was  observed  in 
this  data  sequence.  Fig.  4  shows  the  magnitude  of  the 
estimation  error, 

^\h{vfk)-hkf  .  (3) 

k 

experienced  by  the  B-W  algorithm  at  each  iteration.  In  Fig. 
4b  the  results  from  the  proposed  method  were  used  as  ini¬ 
tial  conditions,  while  in  Fig.  4a  the  initial  conditions  were 
arbitrarily  chosen  close  to  zero  according  to  the  suggestions 
of  [10].  Notice  the  difference  in  the  number  of  iterations 
needed  for  convergence  when  good  initial  conditions  are  pro¬ 
vided.  Results  from  100  Monte  Carlo  runs  (Probability  of 
error  versus  SNR)  of  the  proposed  method  (sohd  line)  and 
the  Baum- Welch  algorithm  (dashed  hne)  with  initial  con¬ 
ditions  from  the  proposed  method  are  shown  in  Fig.  5.  As 
expected,  the  Baum- Welch  algorithm  is  slightly  superior  to 
the  proposed  method  when  properly  initialized. 
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ABSTRACT 

Blind  equalization  of  rapidly  changing  multipath  chan¬ 
nels  is  important  for  mobile  communications,  and  is  ad¬ 
dressed  here  by  expanding  the  time-varying  channel  over 
a  basis.  Time- variation  offers  diversity  and  in  contrast  to 
time-invariant  channels,  blind  equalization  of  time- varying 
channels  becomes  possible  under  mild  conditions  provided 
that  the  expansion  components  can  be  separated.  Multi¬ 
channel  data  are  needed  for  the  challenging  non-sep arable 
case  where  it  is  shown  that  unique  vector  FIR  equalizers 
exist  under  certain  channel  co- primeness  conditions.  Apart 
from  persistence  of  excitation,  no  extra  restriction  is  im¬ 
posed  on  the  input.  Order  selection  methods  and  blind 
equalizers  are  derived  directly  from  the  output  data.  Pre¬ 
liminary  simulations  are  also  presented. 


1.  MOTIVATION  AND  MODELING 

In  wireless  communications,  multipath  environments  may 
change  with  time  as  the  mobile  communicators  move.  If 
the  resulting  time- varying  (TV)  channels  exhibit  variations 
which  are  too  rapid  for  an  adaptive  algorithm  to  track,  ex¬ 
plicit  modeling  of  the  variation  is  weU  motivated. 

Let  the  discrete- time  data  x{n),  n  =  0, ...  A  —  1,  be 

L 

-  0  +  i’(”)  >  (1) 

1=0 


where  the  inaccessible  input  (or  source)  s(n)  is  allowed  to 
be  deterministic  or  random  (white  or  colored)  and  indepen¬ 
dent  of  the  AWGN  v(n).  The  TV  impulse  response  depends 
explicitly  on  time  n,  and  we  model  it  using  a  basis  expan¬ 
sion.  Depending  on  whether  the  basis  modulates  the  input 

or  the  channel,  we  model  h(n\  /)  respectively  as  (see  also 
Figs.  1  and  2):  v  ;  j  V 

Q 

Ml:  A(n;0  =  Y.KH)  h,{n  -  1)  .  (2) 

9=1 

Q 

M2:  h{n-,l)  =  (3) 

q=l 

Expansion  coefficients  hq(hq)  are  time-invariant  (TI),  while 
the  knownhs^is  sequences  bq(bq)  capture  the  time-variation 
and,  depending  on  the  application,  are  chosen  a  priori  as 
e.g.,  polynomials,  complex  exponentials,  or,  wavelets.  To 
allow  for  orders  that  vary  with  time  or  lag  we  define  L  = 
maxnLn  and  Q  =  msiXiQi.  Basis  expansions  for  non- blind 
TV  modeling  were  reported  in  [2]. 


Using  single-  or  multi-channel  output  data  only,  we  first 
recast  the  blind  TV  equalization  as  a  blind  TI  source  separa¬ 
tion  problem  (Section  2).  Structured  time  variations  in  (2) 
and  (3)  offer  what  we  term  channel  diversity  and  degrees  of 
freedom  which  render  the  blind  TV  equalization  weU-posed 
without  use  of  higher-order  statistics  [8],  or,  restrictive  cis- 
sumptions  on  the  input  (e.g.,  whiteness  [7]).  In  Section  3, 
we  consider  Af-channel  data  x'(»)  :=  [x^^\n) . . .  a:^^^(n)] 

with  each  x^'”^(n)  obeying  (1)  and  (2)  of  model  Ml.  The 
resulting  vector  model  is: 


x'{n)  =  Y, 


‘  L 

-  l)s{n  -  l) 


9=1  Li=o 


+  v'(«), 


(4) 


where  prime  stands  for  transpose,  lower  (upper)  bold  is  used 
for  vector  (matrices),  and  the  Mxl  vectors  hq  and  v  are  de¬ 
fined  similar  to  x.  We  establish  that  Mxl  FIR  zero-forcing 

(or  perfect  in  the  absence  of  noise)  equalizers,  {g^'^^(A:)}^o, 
exist,  so  that  within  a  delay  d  e  [0,L  +  K]  (which  is  non 
identifiable  in  the  blind  case)  they  yield: 

K 

*=0 


where  Sqin  —  d)  :=  bq{n  —  d)s{n  —  d)  denotes  the  deconvolved 
input  modulated  by  the  qth  oasis.  Determination  of  Q  and 
L  are  addressed,  and  linear  equation  based  methods  to  es¬ 
timate  g^^^(  A:)  directly  from  output  data  are  also  developed 
in  Section  3. 

Model  M2  was  justified  based  on  the  mobile  kinematics 
in  [8,  7]  and  can  be  related  to  Ml  via:  bq{n)  =  bq(n  — 

V  I  e  [0,1-].  However,  as  M2  is  more  gener^  than  Ml,  it 
requires  separate  treatment  and  direct  blind  equalizers  are 
derived^  in  Section  4  (see  also  [3]  for  alternative  solutions). 

The  ideas  herein  are  important  generalizations  of  the  TI 
results  in  [6,  4,  9,  5,  1]  to  TV  channels.  Relative  to  [7], 
the  present  approach  ^ows  for  deterministic  inputs,  re¬ 
laxes  ident inability  conditions,  and  achieves  the  same  per¬ 
formance  with  less  data. 


2.  CHANNEL  DIVERSITY 
Figs.  1  and  2  illustrate  that  the  TV-SISO  models  of  (2)  and 
(3)  can  be  viewed  as  TI-MIMO  models,  if  the  {^g(n)}^=i 
components  can  be  obtained.  For  example,  if  in  Fig.  2  the 
Xqf(n)  components  can  be  separated  in  the  time-,  frequency- 
or,  cyclic-domain,  the  Xq{n)  =  bq^{n)xq{n)  outputs  can  be 
recovered  by  demodulating  with  the  known  bases.  Blind 
equalization  can  then  be  achieved  using  existing  multichan¬ 
nel  approaches  (e.g.,  [4,  9]).  Hence,  time- variation  (not 
necessarily  that  arising  due  to  fractional  sampling)  offers 
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diversity  which  renders  blind  equalization  of  TV  channels 
easier  than  that  of  TI  channels  when  separation  is  achiev¬ 
able.  The  latter  is  possible  if  for  example  hq{n)  channels 
are  low-pass  and  hq{n)  are  band-pass  with  center  frequen¬ 
cies  far  apart  from  each  other.  Intuitively  speaking,  a  TV 
channel  offers  us  “different  views  for  each  time  point  n,” 
and  hence  the  term  channel  diversity. 

Subsequently,  we  focus  on  the  more  challenging  non- 
sepajable  case  which  requires  multichannel  data 
m  €  [1,-^],  n  £  [0,Ar  —  l1.  Multichannel  data  become 
available  either  witn  multiple  antennas  (see  e.g.,  [4],  [3]), 
or,  by  oversampling  the  continuous  counterpart  of  (l). 

To  illustrate  the  latter,  consider  the  (baseband) 
continuous-time  data:  Xc{i)  =  5(0^c(t;t  — /T),  where 

T  is  the  symbol  period.  With  oversampling  rate  M/T,  we 
obtain:  x{n)  :=  Xc(t)|t=nT/M  =  5(/)/i(n;  n  —  IM),  and 

upon  defining  the  sub-processes  :=  x(nM -|-m  —  1), 

we  find:  n  -  /)  for  m  =  1, . . . ,  M. 

Oversampling  creates  multiple  channels  but  in  contrast  to 
the  TI  case,  x(n)  is  not  necessarily  cyclostationary  and 
is  not  necessarily  stationary.  Our  results  reveal 
that  channel  time  variation  (not  necessarily  periodic)  may 
be  sufficient  to  deal  with  blind  problems. 


3.  MODEL  Ml:  BLIND  EQUALIZATION 

Let  Sq{n)  :=  [hq{n)s{n)  . . .  bq{n  —  L  —  K)3{n  —  i  —  Jf)]  and 
define  for  each  q  £  [1,  Q]  the  {L-^K  -\-l)x  M{K  -|- 1)  block 
Toeplitz  matrix 


■  h;(o) 

0' 

H,  := 

K{l) 

KiL-K) 

0' 

Consider  (4)  in  the  noise-free  case  and  form  the  (W  —  K)  x 
M{K  -h  1)  block  Hankel  data  matrix 


X:= 


x'{N  -  1) 


x\N  -  1  -  iiT) 
x'(0) 


=  S6H,  (6) 


where  the  {N  -  K)  x  Q(L  +  iif  -h  1)  modulated  input  matrix 
Sb  and  the  Q{L  +  -ft"  + 1)  x  M{K  + 1)  channel  matrix  H  are 
given  by 


■sU^-l)  • 

.  s'q{N  -  1)  - 

•  Hi  ■ 

St  := 

,  H:= 

^  s'qIk)  . 

.  Hq  . 

We  assume  the  following: 

(Ml.l)W  —  K>  M[K  + 1),  which  is  satisfied  by  collecting 
sufficient  data; 

(M1.2.1)H  is  at  least  fat;  i.e.,  (M,  I,  Q,  K)  obey 

M{K-^l)>Q{L-\-K^l)  .  (7) 

(M1.2.2)H  is  square;  i.e.,  (7)  holds  as  equality.  To  satisfy 
(7),  a  minimum  Mmin  =  Q  +  1  channels  are  required  with 
a  minimum  equalizer  order  Kmin  =  QL  —  1  (in  the  TI  case, 
Mmin  =  2  and  Kmin  =  L  —  1  .  .  ,  .  , 

(M1.3)H  is  full  rank;  i.e.,  rank(H)  =  Q{L  which 

implies  that  transfer  functions  {-^^*”^2),  q  £  [1,Q],  m  £ 

[1,M]}  are  co-prime.  Note  that  ^^.y  have 

common  zeros  provided  that  for  some  q2  ^  qi  those  are  not 

also  zeros  of 

(Ml.4)bases  bq{n)  are  sufficiently  varying  and  s(n)  is  per¬ 
sistently  exciting  (p.e.)  of  sufficient  order  to  assure  that 
rank(Sb)  =  Q{L  -h  ii"  +  1).  If  modulated  inputs  {s«(n)}J=i 
are  linearly  independent  (sufficiently  distinct  modes),  then 
S6  is  full  rank.  Note  that  relative  to  the  TI  case,  p.e.  con¬ 
ditions  on  s(n)  are  relaxed  by  the  modulating  b^es.  We 
stress  that  s(n)  can  be  either  random  or  deterministic. 

3.1.  Order  determination 

Under  (Ml.l)-(M1.4),  matrix  X  is  full  rank  Q(L  +  iif-hl). 
With  Ki  >  k2  denoting  known  upper  bounds  on  cor¬ 
responding  matrices  Xi,  X2,  wiU  have  rank(X«)  =  Q{L  -f 
-b  1),  i  :=  1,2.  It  is  thus  possible  to  select  the  orders  L 
and  Q  using: 

^  rank(Xi)  -  rank(X2)  ^  _  rank(Xi)  ,£>  , 

“  = - ET&  •  ‘■-—Q 

With  Q,  L  available,  K  is  chosen  to  satisfy  (7)  for  a  given 

M  >  Q  +  1. 

3.2.  Existence  and  uniqueness 

Under  (Ml.2.1)  and  (M1.3),  we  mfer  from  X  =  SbH, 
that  a  unique  linear  FIR  equalizer  exists  to  yield  GX  =  Sb. 

Matrix  G  is  the  pseudo-inverse  which  under  (Ml. 2.2) 
becomes  H“^.  Because  (7)  is  not  satisfied  with  M  =  1,  it 
follows  that  blind  separation  and  equalization  of  TV  chan¬ 
nels  is  impossible  in  the  SISO  case  under  (Ml.l)-(M1.4). 
The  more  channels  {Mmin  =  Q  +  U  required  relative  to  the 
TI  case  is  the  price  paid  for  our  abiuty  to  equalize  (and  thus 
invert)  FIR  TV  channels  with  linear  FIR  equalizers. 

3.3.  Direct  blind  equalizers 

Blind  equalizers  exist  and  are  unique  but  in  order  to  find 
them  we  first  set  n  =  N  —  1, , . .  K 'm  (5)  and  collect  equa¬ 
tions  to  obtain 

:=  ,  (8) 

where  4“)  :=  [g^^^'CO)  . . :=  [6,(iV  -  1  - 
d)s{N  -l-d)...b,{K-  d)s(K  -  d)]',  :=  diag[6,(i\r  - 

l-d)...b,{K-d)],  and  :=  [s{N  -l-d)...s{K-  d)]'. 
We  use  Matlab’s  notation  X(ti  :  12, :)  to  denote  a  submatrix 
of  X  formed  by  the  ii  through  *2  rows  and  all  columns  of 
X.  Next  we  define 

Xo.d:=X(d  +  l:N-ft:,;),  Xa  :=  X(1  :  iV  -  RT  ^  d, :)  ,  (9) 

and  Bi'’-")  :=^'^'>(d  +  l-.N-K,d+l:N-K),  B^/^  := 
B</^(1  :N-K-d,l:N-K-d). 
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From  (8)  and  (9)  it  foDows  that 


Xd4?  =  :  N  -  K  -  d) .  (10) 

We  note  that  +  1  :  N  -  K)  =  :  N  -  K  -  d), 

which  allows  us  to  eliminate  the  input  dependence  from  the 
equations  in  (10)  and  obtain  the  cross-relation: 

Xo,d  4°)  =  Xd  4^)  .  (11) 

The  pair  of  equalizers  (^i\^2^)  will  be  identifiable  (up  to 
a  scale)  as  the  eigenvector  corresponding  to  the  minimum 
eigenvalue  of  X^'^2 


Sq\ 

Jd) 

Oq2 


=  0 


provided  that  the  nullity  =  1.  It  turns  out 

that  under  (Ml. 2. 2),  (Ml. 3),  the  latter  holds  for  the 
minimum-  and  maximum-delay  equalizers  corresponding  to 
(0,  d)  =  (0,  L-\-K),  provided  that  rank(S6)  =  2Q(L-\-K-^1), 
Under  (Ml.l)-(M1.4),  equalizers  corresponding  to  all 
possible  delays  d  €  [0,  i  -h  iif]  and  gi ,  ^2  €  [1,Q]  can  be 
found  simultaneously;  e.g.,  with  d  =  L  K,  qi  ~  and 
52  =  1 ...  Q,  we  obtain: 


r  b<‘+'^)Xo.^+k  _B(0,L+K)X 


L+K 


...  0 


... 


L+K 


X 


(13) 


To  recover  s{n  -  d)  from  the  equalizer’s  output, 
we  simply  demodulate  with  6“^  (n  —  d)  to  obtain  (n  “ 
d)3q(n  —  d)  =  s(n  —  d).  For  each  n,  it  suffices  to  have 
6^0 (w)  7^  0  for  at  least  one  go  G  [l,Q].  If  more  than  one 
basis  are  non-zero,  we  may  align  and  average  the  corre¬ 
sponding  equalizer  outputs. 

With  the  input  available,  one  may  readily  obtain  chan¬ 
nel  estimates  (if  so  desired)  by  solving  (4)  using  standard 
regression  techniques  (see  e.g.,  [2]). 


bi{n) 


4.  MODEL  M2:  BLIND  EQUALIZATION 

The  counterpart  of  (4)  for  the  model  M2  of  (3)  is  given  by 
(see  also  Fig.  2) 


x'(n)  =  5] 


^h;(/)6,(ra)s(n-  /) 


9=1 


1=0 


+  y'{n), 


(14) 


but  instead  of  (5),  we  form  the  N  x  M  matrix  X  =  SbH, 
as  follows  [3]: 


- 1 

.  ^ 

1 

-s;(iv-i)  . 

•  §'Q(fv-i)  ■ 

I 

1  x'(0)  _ 

.  §i(o) 

•  - 

1 

(15) 

The  entries  in  the  N  x  Q{L  + 1)  input  matrix  Sb  and  the 
Q{L  -f- 1)  X  Af  channel  matrix  H  are  given  by: 


s{n) 

■  fi;(o)  ■ 

Sq(n)  :=  bq{n) 

,  H,:= 

s{n  —  X)  _ 

.  KiL)  . 

Assumptions  (Ml.l)-(Ml,4)  are  replaced  by: 

(M2.l)iV  >  M; 

(M2,2)triplet  (M,  X,  Q)  obeys: 

^  ^  Q  (L  +  1)  ,  (16) 

which  compared  to  (7)  requires  more  channels; 
(M2.3)rank(H)  =  Q(L  +  1); 

(M2.4)rank(S6)  =  Q(X  +  l). 

Under  (M2.1)-(M2,4),  we  have  rank(X)  =  Q{L  +  1). 
If  X  is  a  known  upper  bound  on  X,  then  order  X  can  be 
obtained  as  [rank(X)/Q]  -  1,  but  Q  must  be  known. 

With  G  =  H^,  (M2.1)-(M2.3)  guarantee  that  unique 
linear  FIR  equalizers  exist  to  recover  Sb  from  (15).  If  (16) 
holds  as  equality,  then  G  = 

To  derive  blind  equalizers  directly  from  the  data  x(n), 
we  follow  the  notation  of  Section  3  and  start  from  the  zero- 
forcing  condition;  x’{n)g^/^  =  s,(n)  :=bg(n)s{n-d),  which 
for  n  —  N  —  ,  jOj  leads  to  the  matrix  form 

=  4")  :=  ,  (17) 

where  Bg  :=  diag[6q(A  ~  l)...ft^(0)]  and  :=  [s(A  - 
1  —  d) . . .  s(— d)]'.  When  compared  to  (5),  the  equalizers  for 
model  M2  have  order  if  =  0.  Defining  Xo,d  and  Xd  as 
in  (9)  with  if  =  0,  and  B^^^  accordingly,  we  can 

eliminate  from  (17)  and  arrive  at  the  cross-relation: 

Xo,.  gW  =  Xa  .  (18) 

Based  on  (18)  and  selecting  M  =  Q(i+1)  in  (16),  the  (0,  L) 
pMi  or  all  equalizers  can  be  recovered  by  solving  for  the 
minimum  eigenvalues  of  equations  simUar  to  (12)  or  (13); 
see  also  [3]  for  M  >  Q{L  +  1)  with  gi  =  q2.  In  addition  to 
(16),  models  Ml  and  M2  have  different  input  matrices  (S6 
and  S6)  with  correspondingly  different  decompositions: 


Si, 


St 


-  s'{n-i)[Bi{n-i)...Bq{n-i)]  ■ 

s'(Jir)[Bi(if)...BQ(ii:)] 
[BiS...BqS]  . 
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Further  research  is  required  to  characterize  and  compare 
p.e.  requirements  in  terms  of  the  spectrum  of  ^(n)  and 
linear  independence  conditions  among  the  basis  sequences. 
Additional  topics  include  development  of  adaptive  algo¬ 
rithms,  model  validation  studies,  and  especially  for  model 
M2,  order  and  basis  selection  criteria. 

Although  noise  is  included  in  our  simulations,  the  zero- 
forcing  equalizers  were  derived  in  the  noise-free  case.  Argu¬ 
ing  as  in  [1],  it  follows  that  the  minimum  norm  solution  in 
(12),  (13),  or,  (18),  minimizes  the  noise  power  at  the  equal¬ 
izer  output.  Future  work  will  include  noise  explicitly  using 
the  linear  prediction  framework  along  the  lines  of  [5]. 

5.  SIMULATIONS 

We  generated  N  =  300  QPSK  samples  and  with  Q  =  2  basis 
sequences,  hi{n)  =  1,  62(71)  =  exp(y27r7z/50),  we  formed 
data  x{n)  according  to  models  Ml  and  M2.  Outputs  of 
complex  channels  (order  £-  =  3)  were  received  by  M  =  Q  + 

1  =  3  antennas  for  Ml,  and  M  =  Q(X+1)  =  8  for  M2.  One 
realization  of  the  eye  diagrams  before  and.  after  equalization 
are  shown  in  Fig.  3  at  SNR=40dB  for  Ml.  Corresponding 
diagrams  for  M2  at  SNR=25dB  are  depicted  in  Fig.  4. 
The  equalizer  order  for  Ml  was  K  QL  —  1  =  5,  and 
its  coefficients  were  obtained  by  solving  (12)  with  q\  =  1, 
q2  =  2,  and  d  =  L  +  iT  =  8.  For  M2,  equalizer  coefficients 
{^K  =  0)  were  found  via  (18)  with  =  1,  ?2  =  2,  and 
d  =  L  To  illustrate  the  importance  of  TV  modeling, 

we  show  in  Fig.  5  how  the  TI  equalizers  obtained  from  [9] 
perform  on  the  data  of  Figs.  3  and  4.  RMS  performance  of 
the  errors  s(n)  —  5(71)  vs.  SNR  is  plotted  in  Fig.  6  for  Ml 
and  M2  based  on  N  -  150  samples  and  100  Monte  Carlo 
runs.  M2  was  less  sensitive  to  noise  than  Ml  which  also 
appeared  less  robust  to  basis  mismatch  and  p.e.  conditions. 
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Figure  3.  Eye  diagrams  for  model  Ml 
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Figure  4.  Eye  diagrams  for  model  M2 
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Figure  5.  TI  equalizers  on  TV  models 
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Figure  6.  RMSE  Performance  of  Ml  and  M2 
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Abstract 

Afferent,  whole  nerve  signals  recorded  using  an  im¬ 
planted  nerve-cuff  electrode  were  analyzed  using  three 
detectors  based  on  the  1st,  2nd  and  3rd  order  statistical 
properties  of  the  signals.  Results  based  on  standard 
Rectified,  Bin-Integrated  (1st  order  statistical)  processing 
are  compared  with  two  algorithms  based  upon  a  Singular 
Value  Decomposition  (SVD)  of  the  signal  *s  2nd  and  3rd 
order  correlation  (cumulant)  matrices.  Due  to  the  very 
low  signal  levels  obtainable  from  nerve-cuff  electrodes 
and  the  high  levels  of  interference  from  adjacent  muscles, 
the  overall  signal-to-noise  ratio  (SNR)  is  very  poor.  In 
addition,  the  noise  level  is  non-stationary.  The  inherent 
properties  of  the  3rd  order  statistics  of  these  signals  yield 
a  detector  that  performs  better  than  the  other  two. 

1*  Introduction 

It  has  been  known  for  more  than  100  years  that  animal 
muscle  tissue  can  be  made  to  contract  through  application 
of  electrical  current.  More  recently,  this  has  been  applied 
in  the  development  of  Functional  Electrical  Stimulation 
(FES)  systems,  with  the  goal  of  restoring  lost  motor  ftinc- 
tion  in  paralyzed  individuals.  More  than  30  years  of  FES 
development  have  lead  to  the  now  generally  accepted  con¬ 
clusion  that,  in  order  to  reduce  muscle  fatigue  and 
increase  reliability,  closed-loop  systems,  in  which  some 
sort  of  “feedback”  information  is  used  to  control  the 
stimulator’s  parameters,  yield  better  results  than  simple 
open-loop  systems.  In  restoring  muscle  function  via  FES, 
the  goal  is  to  emulate,  as  best  possible,  the  body’s  lost 
natural  functionality.  Given  the  choice  of  using  artificial 
sensors  (goniometers,  strain-gauges,  accelerometers,  etc.), 
versus  utilizing  the  subject’s  still  intact  sensory  system, 
the  latter  is  likely  to  provide  us  with  the  closest  emulation 
of  the  body’s  natural  control  system.  In  order  for  the 
body’s  natural  sensors  to  be  used  effectively,  the  level  of 
information  obtained  from  them  should  be  comparable  to 
that  obtainable  from  artificial  sensors.  Tliis  requires  a 
reliable,  stable,  implantable  transducer  which  is  able  to 
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record  the  sensory  signals  (known  as  “afferent”  nerve  sig¬ 
nals)  being  passed  along  the  body’s  nerve  fibers,  from 
local  touch  receptors,  to  the  brain.  The  only  appropriate 
such  device  presently  suitable  for  use  in  humans  (where 
nerve  damage  must  be  avoided)  is  the  nerve-cuff 
electrode.  Such  cuffs  are  typically  constructed  from  a 
silicone  insulating  tube,  in  which  3  non-insulated  rings  of 
stainless-steel  or  platinum  wire  act  as  electrodes.  The 
cuff,  which  is  slit  longitudinally,  is  opened,  placed  around 
the  nerve,  and  sutured  closed.  Lead  wires  connecting  to 
the  ring  electrodes  are  routed  to  an  appropriate  exit  site 
and  through  the  skin,  where  they  are  attached  to  an 
external  connector.  For  our  purposes,  these  electrodes  are 
connected  to  a  special  high-gain  (110,000x),  low-noise 
amplifier.  The  resulting  amplified  nerve  signal  is 
commonly  referred  to  as  the  Electroneurogram  (ENG). 

We  have  constructed  a  prosthetic  device  utilizing  this 
ENG  signal  (a  “neuralprosthetic”)  in  which  a  custom 
designed  DSP-based  system  controls  an  8-channel  FES 
stimulator.  The  entire  device  is  small  enough  to  be  easily 
bom  by  the  subject,  and  uses  standard  rechargeable 
batteries.  Natural  sensory  information  can  be  applied  to  a 
variety  of  FES  tasks.  We  have  primarily  been  concerned 
with  two:  Hand  Grasp  Restoration  in  Tetraplegics,  and 
Hemiplegic  Drop-foot  Correction.  Tetraplegic  subjects, 
who  have  limited  use  of  their  arms,  are  typically  unable  to 
firmly  grasp  objects.  Through  stimulation  of  the  muscles 
in  the  hand  and  forearm,  simple  grasp  functions  can  be 
restored,  using  the  processed  nerve  signal  as  a  feedback 
signal  indicating  when,  due  to  insufficient  stimulation, 
the  grasped  object  begins  to  “slip”.  Subjects  suffering 
from  a  “drop-foot”  are  unable  to  fully  activate  the  muscles 
which  rotate  the  foot  up/down.  Thus,  because  they  can  not 
achieve  adequate  toe  clearance,  they  are  unable  to  walk 
nonnally.  Stimulation  of  these  muscles  can  improve  such 
subject’s  gait,  provided  it  occurs  at  the  correct  time  in  the 
gait  cycle.  Timing  has,  traditionally,  been  determined  via 
a  mechanical  switch  placed  in  the  subject’s  shoe,  which 
turns  stimulation  off  upon  closure  (heel-contact)  and  on 
upon  opening  (heel-lift).  We  have  previously  shown  that 
the  nerve  signal  recorded  by  nerve-cuff  electrodes  can  be 
used  as  a  sort  of  “natural”  heel-contact  switch,  [2].  In 
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both  applications,  the  fundamental  problem  is  the  reliable 
detection  of  the  presence  of  ner\>e  signal  activity  in 
background  noise.  Essentially  then,  the  problem  reduces 
to  one  of  pure  endpoint  or  transition  detection  in  the  drop- 
foot  application. 

2.  Considerations  Specific  to  this  Problem 

There  are  certain  aspects  of  the  present  problem  (in  the 
use  of  human  nerve  signals)  that  complicate  detection: 

•  The  noise  is  some  non-deterministic  combination  of 
tonic  nerve  firing,  electrode  thermal  noise,  and  amplifier 
1//  noise.  Although,  in  the  strictest  sense,  due  to  the  pres¬ 
ence  of  background  (tonic)  nerve  firing,  this  isn’t  pure 
noise,  in  practice,  it  is  dominated  by  the  thermal  and  1// 
components  of  tire  electrodes  and  amplifier.  In  order  to 
folly  activate  the  paralyzed  muscles  using  FES,  it  is  often 
necessary  to  apply  stimulation  voltage  pulses  in  excess  of 
140V  to  the  skin’s  surface.  These  pulses  (typically  under 
300msec  in  duration)  propagate  through  the  body  (acting 
as  a  volume  conductor)  and  induce  large  stimulation 
artifact  impulses  in  the  recorded  nerve  signal.  Also,  the 
Electromyographic  (EMG)  signal  from  adjacent  muscles, 
either  naturally  occurring  though  voluntary  activation,  or 
stimulation  induced,  acts  as  a  high  level  noise  source.  In 
addition,  external  EMF  sources  (typically  mains  power) 
are  often  of  sufficient  intensity  to  induce  large  noise 
potentials.  The  nerve  signal  amplitudes  typically  recorded 
are  in  the  1-10  gVolt  range  for  common  sensory  stimuli. 
Therefore,  the  initial  SNR  of  these  raw  nerve  signals  is 
often  as  low  as  -60dB!  Fortunately,  it  is  known  that  the 
majority  of  nerve  signal  information  is  confined  to  a 
narrow  frequency  band,  from  1.0  to  3.0kHz.  Therefore,  an 
important  first  step  in  the  detection  process  is  the 
application  of  a  simple  (non-adaptive)  bandpass  filter. 
This  filter,  combined  with  other  processing  (windowing, 
adaptive  thresholding,  etc.)  yields  nerve  signals  with 
typical  SNRs  in  the  range  from  0  to  -i-3dB. 

•  The  nerve  signals  recorded  by  cuff-electrodes  are 
dominated  by  the  activity  from  what  are  termed  fast 
adapting  sensory'  receptors.  These  receptors  respond, 
primarily,  to  the  1st  derivative  (i.e.  velocity)  of  applied 
force.  Consequently,  during  a  period  of  activity,  defined 
by  the  application  of  a  mechanical  stimulus  to  the  skin 
within  the  nerve’s  innervation  area,  only  the  onset  and 
offset  of  contact  initiate  detectably  increased  nerve 
activity.  Thus,  activity  occurs  in  short  bursts  where  it  is 
usually  not  possible  to  distinguish  between  force 
applicaUon  and  force  removal.  The  practical  implication 
of  this  fact  for  the  use  of  afferent  nerve  activity  in  a  drop- 
foot  correction  system,  is  a  contact  onset/offset  ambiguity 
that  must  be  resolved  by  other  means. 

•  All  methods  we  have  tried  to-date  rely  upon  a  single 
variable  test  against  a  fixed  threshold.  When  the  value  of 


the  processed  ENG  signal  is  below  the  threshold  level,  the 
null  hypothesis  Ho  is  true,  and  the  present  state  (gait 
phase)  is  unchanged.  Upon  exceeding  the  threshold,  the 
alternative  hypothesis,  H],  is  indicated,  and  the  present 
state  is  toggled  (i.e.,  an  edge  occurred).  Of  particular 
significance  is  the  constraint  that  tlie  number  of  False 
Positives  (FPs),  or  erroneous  edge  detections  be, 
essentially,  zero.  The  consequences  of  an  FP  are  that  the 
stimulator  will  be  erroneously  deactivated  while  the  leg  is 
still  in  motion,  sufficient  toe  clearance  will  not  be 
maintained,  and  the  subject  may  fall.  Thus,  the  detection 
threshold  must  be  set  sufficiently  high  such  that  the  FP 
percentage  is  low.  Conversely,  if  tlie  threshold  is  too  high, 
resulting  in  missed  detections,  the  stimulator  will  not  be 
turned  off  during  the  stance  (standing)  phase,  the 
subject’s  muscles  will  tire  rapidly  and.  again,  the  subject 
may  fall.  Thus,  ideally,  the  processed  ENG  signal,  as  foe 
input  to  the  threshold  detector,  should  have  a  very  high 
SNR  (i.e.  the  signal  amplitude  during  transitions  should 
be  high,  while  the  background  level  during  constant  force 
presence/absence  should  be  close  to  zero).  Given  low  SNR 
inputs  (+3dB  max.),  and  very  non-stationaiy  conditions 
(variable  foot  contact  pressures,  variable  gait  cycle  timing, 
plus  variable  muscle  and  external  EMF  interference 
signals),  the  demands  upon  the  signal  processing 
algorithm  for  robust  ENG  processing  are,  indeed,  strict! 

•  Finally,  it  is  important  to  note  that  this  is  an  uncon¬ 
ditionally  real-time  processing  application.  Most  ENG 
processing  algorithms  have,  up  unUl  now,  primarily  been 
designed  to  characterize  the  properties  of  afferent  nerve- 
cuff  recordings  off-line,  and  typically  used  inherently  non- 
real-time  methods,  such  as  ensemble  averaging,  to 
enhance  SNRs.  When  real-time  information  is  desired, 
the  standard  processing  method  still  uidely  used  is  to  bin- 
integrate  (over  the  inter-stimulation  pulse  interval)  the 
rectified,  filtered  signal.  Commonly  referred  to  as  the  RBI 
(Rectified.  Bin-Integrated)  signal,  this  yields,  essentially  a 
standard  Ip-norm  detector  (or  tlie  energy  over  a  window, 
if  the  squared  signal  is  integrated),  based  on  the  signal  s 
1st  order  statistics.  Unfortunately,  while  simple  to  imple¬ 
ment  (even  with  analog  circuitry),  energy  detectors  per¬ 
form  poorly  on  low  SNR  signals.  i\ith  non-stationary 
noise.  In  order  to  improve  detection  reliability,  specifi¬ 
cally  for  the  drop-foot  application,  an  adaptive  noise 
threshold  was  incorporated  into  the  standard  RBI  algo¬ 
rithm.  along  with  a  windowed  detector,  [7].  Using  these 
modifications,  we  obtained  an  average  detection  ratio  of 
85%,  with  no  FPs.  Since  this  was  deemed  unacceptable, 
we  began  investigating  more  robust  detectors,  in  which  a 
fundamental  criterion  is  the  abiliU'  to  reject  non- 
stationary,  wide-band  (essentially  wliite)  noise. 

It  has  previously  been  shown,  [4],  [5],  that  good 
detection  reliability  is  achievable  using  second-  and 
higher-order  statistics  (HOS)  on  speech  signals  with 
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SNRs  in  the  range  mentioned  above.  This  observation  has 
prompted  us  to  investigate  tlie  performance  of  detectors 
used  for  speech  signals  in  the  present  problem.  There  are 
many  similarities  between  the  problems  of  detecting 
speech  in  noise  and  nerve-cuff  signals  in  noise,  indicating 
that  similar  methods  may  be  applicable.  However,  one 
iundamental  difference  between  speech  and  nerve  signals 
is  the  onset/oflfset  ambiguity  issue  mentioned  above. 

2.1  Autocorrelation-based  detectors 

The  first,  more  advanced,  detector  investigated  is  based 
upon  the  signal’s  2nd-order  statistical  properties.  The 
method  is  based  on  the  fact  that  the  autocorrelation 
matrix  R  of  a  signal  that  contains  only  white  noise  is 
diagonal,  with  all  diagonal  entries  equal  to  the  variance 
of  the  noise,  a".  All  (say  0)  eigenvalues  of  this  matrix 
are,  therefore,  equal  to  a",  as  well.  If  an  information 
(non-white)  component  is  also  present  in  the  signal,  then 
R  is  no  longer  diagonal,  and  consequently  its  (real, 
positive)  eigenvalues  are  not  all  equal.  Testing  for  the 
presence  of  activity  in  the  signal  thus  becomes  equivalent 
to  testing  for  (non)equality  of  the  eigenvalues  of  R,  under 
the  assumption  that  the  additive  noise  is  white.  Given 
that  R  can  be  estimated  from  a  record  of  N  samples 
through  the  observation  matrix  X,  as  R  =  X  •  X^,  the 
singular  values  of  X  can  be  used  for  the  test.  These  are 
obtained  using  a  Singular  Value  Decomposition  (SVD). 
It  has  been  shown  that  a  computationally  efficient  method 
of  solving  the  SVD  problem,  when  the  data  is  real-only,  is 
the  use  of  the  Jacobi  rotation  algorithm,  [3]. 

The  actual  test  is  performed  by  comparing  the  differ¬ 
ence  or  the  ratio  of  the  maximum  and  the  minimum 
eigenvalues,  not  to  zero  or  one,  respectively  (as  would 
ideally  be  the  case),  but  to  appropriately  set  thresholds.  In 
theoiy^,  a  significant  advantage  of  this  detection  method 
over  the  RBI  (or  energy)  method  is  that  it  is  immune  to 
the  noise  level  (variance).  This  is  because  the  white  noise 
variance  acts  as  a  DC  offset  in  the  eigenvalue  domain, 
which  doesn’t  affect  the  eigenvalue  difference.  In  prac¬ 
tice,  this  detector  is  much  more  immune  to  non-stationary 
noise  levels  than  the  RBI  detector,  and  yields  better 
detection  SNRs.  Yet,  since  it  primarily  acts  as  a  whiteness 
versus  non-whiteness  test,  it  is  sensitive  to  the  color  of  the 
noise.  Note  that  in  our  case  a  significant  proportion  of 
the  noise  is  due  to  the  amplifier’s  colored  (1//)  noise. 

2.2  Cumulant-based  detectors 

In  order  to  overcome  this  limitation,  detectors  based  on 
the  higher-order  statistics  (HOS)  of  the  data  were  also 
tested.  The  3rd-order  statistics  of  a  signal  provide  a 
measure  of  the  skewness  (difference  from  the  Gaussian 
distribution)  in  the  signal’s  statistical  distribution, 
whereas  the  2nd  order  statistics  (autocorrelation  -and 


spectrum)  only  provide  information  about  the  signal’s 
variance.  Detectors  based  on  3rd-order  cumulants  have 
been  successfully  employed  for  speech  signals  due  to  the 
fact  that  quadratic  phase  coupling,  present  in  voiced 
speech  due  to  non-linearities  in  the  vocal  tract,  [4],  [1], 
can  be  detected  using  3rd-order  statistics.  Although  a 
precise  model  for  the  signals  recorded  by  nerve-cuff 
electrodes  has  yet  to  be  developed,  it  has  been  shown,  [6], 
that  these  signals  result  in  the  non-linear  combination  of  a 
series  of  action  potentials,  themselves  modeled  by  a  non¬ 
linear  combination  of  sinusoidal  functions.  Thus  it  seems 
reasonable  to  assume  that,  in  analogy  with  speech  signals, 
there  are  significant  (i.e.  detectable)  non-linearities  in 
nerve-cuff  electrode  signals.  In  this  case,  it  can  be  proven 
that  the  3rd  order  cumulant  of  such  signals  cannot  be  zero 
for  all  lags.  Thus  a  detector,  using  a  method  similar  to 
that  employed  in  the  eigenvalue-spread  algorithm,  can  be 
designed  using  only  this  diagonal  vector  as  follows: 

The  3rd  order  cumulant  of  a  record  of  data,  x{n),  is 
computed  as:  =  (1  /  x{n)x{n  +  T^)x{ti  +  T, ) 

for  an  appropriate  set  of  lags  (tq,  Tj),  lying  on  the  main 
diagonal  (ro  =  ti)  of  the  2-D  plane.  This  is,  essentially, 
equivalent  to  computing  the  autocorrelation  of  x(n)  and 
x^{n).  TheOxgToeplitz  matrix  C3  is  formed  from  the 
first  Q  diagonal  lags  (where  O  is  chosen  empirically)  and 
its  SVD  is  computed,  as  in  the  2nd  order  case.  In  the  3rd 
order  case,  however,  it  is  sufficient  to  simply  use  the 
maximum  eigenvalue  (rather  than  the  difference  between 
ma-ximum  and  minimum)  as  the  single  test  parameter.  In 
this  case,  we  are  testing  the  matrix  entries  against  zero  as 
an  indication  of  the  presence  of  skewed  components  in  the 
data  (here,  noise  is  assumed  to  be  colored,  but  non- 
skewed).  In  practice,  the  maximum  eigenvalue  is 
compared  against  an  empirically  determined  threshold. 

The  3rd  order  method  requires  slightly  more  computa¬ 
tions  than  the  2nd  order  case;  yet,  it  is  substantially  less 
sensitive  to  additive  (non-stationary)  noise  variance  than 
either  the  RBI  or  2nd  order  methods.  This  is  important  in 
a  neuralprosthetic  application  where  noise  levels  (and 
signal  properties  in  general)  vary  not  only  amongst 
applications  (i.e.  the  nerve  used,  its  size,  the  size  of  the 
cuff  electrode,  etc.),  but  also  amongst  patients,  and  even 
with  the  time  after  implantation.  Finally,  the  storage 
requirements  of  both  the  2nd  and  3rd  order  algorithms  are 
well  within  the  bounds  of  the  on-chip  memory  of  most 
commercial  DSPs  in  contrast  to  most  frequenc>^  domain 
(FFT  or  wavelet)  methods,  which  generally  require  the 
addition  of  external  memoIy^  This  is  an  important 
consideration  for  portable  (or  implantable)  systems,  where 
low  power  consumption  is  essential. 
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3.  Results,  Discussion  and  Conclusion 

Figures  1  and  2  show  a  comparison  of  the  3  algorithms 
described,  under  non-stationaiy  noise  conditions.  In 
Figure  1,  linearly  increasing  white-noise  (up  to  100%  of 
nominal)  was  added  to  a  typical  afferent  nerve-cuff  (ENG) 
signal  in  the  region  from  6000  to  10000  samples.  The  in¬ 
creased  amplitude  between  samples  3000  and  5000  corre¬ 
sponds  to  increased  nerve  activity  resulting  from  a  single 
nicrhaniral  stimulation  of  the  skin  in  the  innervated  area. 
This  is  also  indicated  by  the  arrow  in  Figure  2.  The  ordi¬ 
nate  is  in  Volts.  The  nerve-cuff  output  signal  was  ampli¬ 
fied  by  220,000,  filtered  with  a  4th  order  Butterworth 
bandpass  (500Hz-3kHz)  filter,  and  digitized  to  12-bits 
(±5V  range)  using  a  sampling  frequency  of  10,000Hz. 

Figure  2  shows  detection  results  when  the  3  detectors 
are  applied  on  the  noisy  signal  in  Figure  1.  Note  that  all 
tiiree  detect  the  true  ENG  activity  (arrow),  although  the 
noise  baseline,  which  defines  the  SNR  of  the  detector 
(since  the  data  is  normalized  to  the  peak  value),  is  highest 
for  the  RBI  detector  and  lowest  for  the  cumulant  detector. 
Thus  the  fiinmlant  detector  yields  the  highest  SNR  and 
the  RBI  detector  the  lowest,  with  the  eigen-spread  detec¬ 
tor’s  SNR  falling  in  between.  As  is  evident  in  Figure  2, 
the  SNR  of  the  RBI  detector  decreases  markedly  with  in¬ 
creased  noise  power.  Both  the  eigen-spread  and  cumulant 
detectors  continue  to  function  at  100%  added  noise  power. 

In  order  for  a  natural  sensory  based  device  to  be 
accepted  in  clinical  applications,  the  amount  of  parameter 
aryngtinfint  required  by  the  user  (or  physiotherapist)  miKt 
be  minimal.  This  has  proven  to  be  a  severe  drawback  with 
RBI  based  detectors.  Although  we  have  obtained 
reasonable  success  by  adding  adaptive  noise  thresholding 
to  the  basic  algorithm,  we  have  not  yet  achiwed  a  truly 
robust  RBI  implementation  that  does  not  require  frequent 
parameter  adjustments.  Although  it  cannot  be  claimed 
that  HOS  offer  the  best  solution  for  all  types  of  signals, 
our  preliminary  results  show  that  they  hold  great  promise 
in  the  detection  of  afferent  nerve  signals  in  noise.  Further 
improvements  are  anticipated  through  the  use  of  (i) 
automatic  thresholding  based  on  a  fixed,  specified  FP 
ratio,  or  (ii)  a  bi-frequency  domain  bi-coherence 
magnitude/phase  detector,  [1].  Further  characteriMtion 
of  the  statistical  properties  of  nerve-cuff  sign^s  will  be 
required  to  fully  optimize  future  detection  algorithms. 

References 

11]  Fackrell  J.,  McLaughlin  S.,  “Detecting  Phase  Cou¬ 
pling  in  Speech  Signals,”  IEEE  Colloquium  Digest  on 
Speech  and  Image  Processing,  pp.  4/ 1-4/8  (1995). 

[2]  Haugland  M.,  Hoffer  J.,  Sinkjaer  T.,  “Skin  Contact 
Force  Information  in  Settsory  Nerve  Signals  Recorded 


Figure  1.  ENG  signal  plus  white  noise 


Figure  2.  Results  from  the  three  detectors 


by  Implanted  Cuff  Electrodes,”  IEEE  Transactions  on 
Rehab.  Engineering,  vol.  2,  no.  1,  pp.  18-28  (1994). 

[3]  Haykin  S.,  “Adaptive  Filter  Theory,”  2nd  Edition,  pp. 
418-428,  Prentice  Hall  (1991). 

[4]  Rangoussi  M.,  Bakamidis  S.,  Carayannis  G.,  “On  the 
use  of  SVD  and  HOS  for  robust  endpoint  detection  of 
speech,”  in  Levels  in  Speech  Comm.:  relations  and 
interactions  TEA.  C.Sorin,  pp  267-279,  Elsevier  (1995). 

[5]  Rangoussi  M.,  Carayannis  G.,  “Adaptive  Detection  of 
Noisy  Speech  using  Third-Order  Statistics,  Inti.  J. 
Adapt.  Contr.  &  Sig.  Proc.,  Special  Issue  on  HOS, 
Wiley,  (to  appear,  Dec.  1995). 

[6]  Stein  R.,  Oguztoreli  M.,  “The  Radial  Decline  of  Nerve 
Impulses  in  a  Restricted  Cylindrical  Extracellular 
Space,”  Biol.  Cybernetics,  vol.  28,  pp  159-165  (1978). 

[7]  Upshaw  B..  Sinkjaer  T.,  “Natural  vs.  Artificial  Sensors 
Applied  in  Peroneal  Nerve  Stimulation  ,  Proc.  of  5th 
Vienna  International  Workshop  on  Functional 
Electrostimulation,  pp.  239-242  (1995). 


189 


Unsupervised  and  Non  Parametric  Bayesian  Classifier  for  HOS  Speaker 
Independent  HMM  Based  Isolated  Word  Speech  Recognition  Systems 

M.  Zribi*,  S.  Saoudi**  and  F.  Ghorbel* 

*  Groupe  de  recherche  Images  et  Formes 
Ecole  Nouvelle  d'Ingenieurs  en  Communication 
Institut  National  des  Telecommunications 
Cite  Scientifique  59658  Villeneuve  d'Ascq,  France. 

**  Departement  Signal  et  Communication 
Tdecom  Bretagne,  Plouzane  29285  Brest  cedex  France. 


Abstract 

Here,  we  consider  a  speaker  independent  Hidden 
Markov  Model  (HMM)  based  isolated  word  speech 
recognition  system.  The  most  general  representation  of 
the  probability  density  function  (pdf),  in  the  classical 
HMM,  is  a  parametric  one  (i.e,  a  Gaussian).  We  intend 
here  to  derive  an  unsupervised,  non  parametric  and 
multidimensional  Bayesian  classifier  based  on  the  well 
known  orthogonal  probability  density  function  (pdf) 
estimator  which  does  not  assume  any  knowledge  of  the 
distribution  of  the  conditional  pdfs  of  each  class.  Such 
result  becomes  possible  since  this  non  parametric 
estimator  is  suitable  and  adapted  to  Expectation 
Maximization  (EM)  mixture  identification  algorithm. 

Keywords  :  Unsupervised  non  parametric  Bayesian 
classifier,  orthogonal  probability  density  function 
estimate.  Expectation  Maximization,  Cepstrum 
coefficients.  Line  Spectrum  Pairs,  Speech  recognition. 
Hidden  Markov  Model. 

1.  Introduction 

Let  us  consider  the  isolated  word  speech  recognition.  For 
each  word  of  the  vocabulary,  we  want  to  design  a  separate 
M-state  HMM.  We  represent  the  speech  signal  of  a  given 
word  as  a  time  sequence  of  spectral  vectors  (i.e,  the 
Cepstrum  or  the  Line  Spectrum  Pairs  (LSP)  coefficients). 
In  a  recent  study  [4],  we  proved  that  these  two  different 
kinds  of  acoustic  analysis  set  of  parameters  provide  a 
comparable  recognition  rate  performance.  In  this  paper, 
we  focus  our  attention  on  the  use  of  the  LSP  parameters 
instead  of  using  the  Cepstrum  coefficients.  The  d  LSP 
coefficients  are  computed  with  the  antisymmetric  form  of 
the  Split  Levinson  algorithm.  This  method  is  shown  to  be 
better,  in  terms  of  complexity,  than  other  known 
algorithms  [8],  d  is  chosen  to  be  equal  to  10.  Thus,  for 
each  vocabulary  word,  we  have  a  training  sequence 


consisting  of  observations  of  d-multivariate  LSP.  The  first 
task  is  to  build  individual  word  models  by  adjusting  the 
model  parameters  in  order  to  maximize  the  likelihood  of 
the  observation  sequence.  The  most  general  representation 
of  the  conditional  pdf,  for  which  an  estimation  procedure 
has  been  formulated  is  a  Gaussian  distribution  [1,2,5]. 
The  goal  here  is  to  make  refinements  on  the  pdf 
representation  so  as  to  improve  the  capability  of  mndsling 
the  spoken  word  sequences.  When  we  want  to  design  a 
speech  recognition  system,  two  fundamental  procedures 
are  generally  required.  Firstly,  some  feature  descriptors 
are  extracted  from  the  observed  speech  signal.  Secondly, 
the  signal  is  labeled  using  a  classification  rule  in  the 
features  space.  Different  classification  algorithms  are  used 
for  such  problem.  The  statistical  approach  is  recognized  as 
efficient  (Hidden  Markov  model,  Bayesian  classification 
rule,..).  For  automatic  speech  recognition,  the 
unsupervised  classifier  is  suitable  since  it  is  able  to  be 
adapted  to  the  speaker.  The  best  statistical  classifiers  are 
those  based  on  the  Bayesian  classification  rule  since  it 
minimizes  the  posterior  probability  of  miss-classification 
which  usually  needs  assumption  on  the  pdf  distributions. 
However,  we  verify  experimentally  that  the  conditional 
distributions  with  respect  to  a  given  class  of  the  LSP  are 
not  close  to  a  parametric  one  and  change  considerably 
according  to  the  speaker,  the  pronounced  word  and  so 
on...  The  common  recognition  mechanism  is  based  on  the 
HMM  and  makes  use  of  diagonal  Gaussian  output 
distribution  for  each  state.  Therefore,  it  is  well  easy  to 
show  that  the  usual  Gaussian  hypothesis  is  not  an  efficient 
appro.\imation.  In  this  paper,  we  intend  to  present  an 
efficient  unsupervised  Bayesian  classifier  without 
assumption  on  the  distributions  of  the  LSP  coefficients.  In 
this  unsupervised  context,  the  Bayesian  classification  rule 
which  known  by  its  optimality  in  the  mean  of  the  posterior 
probability  of  miss-classification  criterion  usually  needs 
some  parametric  hypothesis  for  the  conditional 
probability  density  function  of  each  classes.  Using  the 
orthogonal  probability  density  function  estimate  [10],  the 
suggested  classifier  algorithm  does  not  need  any 
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assumption  on  the  distribution  of  the  observed  data.  In 
this  work,  the  proposed  classifier  is  designed  in  two  steps. 
The  mixture  identification  is  the  first  step.  It  consists  on 
the  estimation  of  the  mixture  parameters  :  the  a  priori 
probability  and  the  conditional  probability  density 
functions  of  each  class.  It  will  be  done  with  the 
Expectation  Maximization  algorithm  (EM)  [9].  The 
second  step  consists  of  the  application  of  the  Bayesian 
classification  rule.  The  paper  is  organized  as  follows  :  In 
section  2,  we  give  some  elements  of  the  isolated  word 
Hidden  Markov  Model.  In  section  3,  we  recall  the 
classical  EM  algorithm.  Section  4  is  devoted  to  the 
presentation  of  the  suggested  non  parametric  classifier. 


iv)  The  initial  state  distribution  n={Pj}  for  7  ^  ^  Mand 
Pi=Pro[  q,=Si].  For  our  case,  i.e,  left-right  model,  we 
choose  p,=l  and  pj=0  for  i>l. 

The  complete  specification  of  an  HMM  requires  then 
specification  of  M  (the  number  of  states),  the  M- 

continuous  d-multivariate  pdf  matrix 

transition  A=[aij],  l<i,j<.M,  and  the  initial  state 
distribution  ft.  For  convenience,  we  use  >-®=(A,  B,  11)  to 
denote  the  HMM  model  for  the  i*  word.  A  block  diagram 
of  an  isolated  word  HMM  recognizer  is  given  in  Figure  2. 

3.  The  classical  EM  algorithm 


2.  Elements  of  the  isolated  word  HMM 

TVn  isolated  word  HMM  is  built  up  of  the  following  : 

i)  M,  the  number  of  states  in  the  model.  We  denote  Sj,  the 

i*  state  for  i=l, . ,M.  For  our  simulation  M  is  equal  to 

10. 

ii)  The  d-multivariate  pdf  7^(x)  for  each  state  S; 

x)|  ).  The  observation  is  a  continuous  random 

variable,  i.e,  the  Line  Spectrum  Pairs  (LSP)  coefficients. 

iii)  The  state  transition  probability  distribution  A=[ajj] 
with  a^=Pro[  q,^,=S/  q,=Si  ]  for  l<i,J^M,  where  qj 
denote  the  state  at  time  t.  For  our  case,  i.e,  left-right 
model  (see  Figure  1),  we  have  ajj=0  for  j<i  or  j>i+2. 

a,,  a„  a„ 


The  classical  EM  assumes  that  the  observed  data  is  a 
realization  of  a  mixture  of  parametric  distributions,  so  that 
its  pdf  can  be  Avritten  as  : 

K  K 

f(x)  =  YjnJ(x/QJ,  with  0<n^<l  and 

k=l  k=2 

where  /  (x/df^)  is  the  conditional  pdf  of  class  k  and  is 
the  probability  a  priori  of  each  class  of  a  LSP  vector.  This 
algorithm  is  iterative  and  has  three  main  steps.  We 
propose  to  describe  it  here  in  the  case  of  the  Gaussian 
hypothesis  (  i.e  :  0^^  ^k)  where  is  the  mean 

vector  and  a  is  the  covariance  matrix  of  the  class  k). 


Figure  2.  block  diagram  of  an  isolated  word  HMM  recognizer  ( the  HMM  model  for  the  i**"  word ) 
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-  Initialization  step  :  We  suppose  the  number  of  classes  K 
is  known  and  then  an  initial  solution  of  the  parameters  of 
the  mixture  are  extracted  from  the  histogram. 

-  Expectation  step  :  It  consists  on  the  estimation  of  the  a 

posterior  probability  for  the  realization  x.^ 

belonging  to  the  class  k  at  the  n*  iteration  : 


n:f(x,/ei) 

tnVlx^/Qj) 

j^i 


-  Maximization  step  :  We  build  here  the  parameters 
needed  for  the  next  step,  in  the  follow  way  : 


4.2.  Description  of  the  non  parametric  EM 

This  approach  do  not  assume  a  knowledge  on  the  kind  of 
the  conditional  pdf  of  LSP  parameters,  so  that : 

=(°(0 . 0)J' . . . aRdf(x/Qj)  =  (X). 

a.  Initialization  step :  We  suppose  the  number  of  classes  K 
is  known  and  then  an  initial  solution  of  the  parameters  of 
the  mixture  are  extracted  from  the  histogram. 

b.  Expectation  step  :  In  this  step,  we  estimate  the  a 

posterior  probability  for  the  LSP  vector  x,- 

belonging  to  the  class  k  at  the  n*  iteration  by  : 


;  N 
J  v 


ix,nl(x,) 


i=i 


,  Hcx.-HT'Xx.-iiT'/KM 

- r - 

1=1 

for  k=I. . K. 

4.  The  proposed  non  parametric  EM 
4.1.  Estimation  based  on  orthogonal  expansions 


y  1 


fj(xxK) 

lL^]f(x,/^"j) 


c.  Maximization  step  :  The  a  posterior  probability 
of  each  x/  is  computed.  So  that,  at  (n+1)**'  iteration,  we 
have : 


7  ^ 

^«+/  X  ’  y  t 

",  =—Ltt,(x), 

N  J=] 


K.,„  =int\(N”*' f’]  where 


The  estimation  of  the  pdf  based  on  methods  of  Fourier 
analysis  is  suitable  for  this  situation.  Let  X  be  a  random 
vector  taking  values  in  the  d-dimensional  Euclidean  space 

and  suppose  that  the  distribution  of  X  is  described  by 
a  probability  density  function  /  Given  a  sample 

Xj, . of  N  independent  observations  of  X,  the 

estimator  of /is  the  probability  density  function  : 


JzsJmjSsO 


(m. 


1^/ 

. . (•”{ 


_  where 

I"— / 


. mj  )  (^J  k  (^J  ) 


f"; . 


form.  =0, . 

J  Nl 


The  Bayesian  rule  :  After  the  mixture  identification,  the 
Bayesian  rule  are  applied  in  order  to  classify  the  speech 
signal  according  to  their  LSP  vector  x. : 


k(x^)  =  Axi 


max 

HkiK 


K/<'x/0j) 


7  ^ 

=-E% . 

Jy  i=t 

e^^^  ^^)(x)\  ^  is  a  normal  complete  basis  of 

]3.b[  is  an  interval  of  the  real  line.  For 

simplicity  we  consider  the  same  Kj,  =  for  all  j=l, . d. 

This  assumption  does  not  induce  a  bad  behavior  on  the 
estimation  of  parameters  since  we  use  as  orthogonal  basis 
functions  in  the  multivariate  case  the  product  one  of  the 
imivariate  basis. 


where  k(x^  represents  the  label  of  the  class  of  the  vector  x^. 

For  the  database,  the  set  of  speech  sequences  is  separated 
into  two  parts  :  one  for  training,  the  other  for  testing.  The 
database  contains  10  digits  (0  to  9)  pronounced  by  25 
speakers  with  ISO  utterances  in  the  training  set  and  100  in 
the  test.  The  analog  voice  signal  is  digitized  at  8  khz.  The 
signal  is  multiplied  by  a  32  ms  Hamming  window.  The 
LSP  coefficients  are  computed  eveiy  16  ms.  Finally,  once 
the  set  of  Word  HMMs  has  been  designed  and  optimized, 
recognition  of  an  unknown  word  is  performed  by 
computing  the  probability  of  the  observation  sequence  for 
each  word  model  and  select  the  word  whose  model  score 
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is  highest  (i.c.,  the  highest  likelihood).  As  we  have  seen 
here,  the  non  parametric  aspect  comes  from  the  use  of  the 
orthogonal  density  estimates  in  the  mixture  identification 
step  which  is  reduced  to  the  estimation  of  the  first  Fourier 
coefficients  of  these  densities. 

5.  Conclusions 

We  have  considered  a  speaker  independent  HMM  based 
isolated  word  speech  recognition  system.  The  speech 
signal  has  been  represented  as  a  time  sequence  of  LSP 
coefficients.  We  have  shown  that  the  conditional 
distributions,  with  respect  to  a  given  state,  are  note  close 
to  a  parametric  one  (i.e,  a  Gaussian).  We  have  suggested 
an  unsupervised  and  non  parametric  estimator  based  on 
orthogonal  expansions  to  improve  the  pdf  representation. 
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Abstract 

A  new  fast  and  robust  HOS  based  algorithm  for  simulta¬ 
neous  voiced! unvoiced  detection  and  pitch  estimation  using 
3-level  binary  speech  signals  is  presented.  An  accurate  and 
reliable  voicedivnvoiced  detection  of  a  speech  signal  and 
associated  pitch  period  estimation  from  the  voiced  part  is 
made  in  coloured  noise  environments  with  low  SNR.  The  use 
of  the  3-level  binary  speech  signals  dramatically  reduces  the 
computational  effort  required  in  evaluating  the  third  order 
cumulant.  The  superior  performance  of  the  new  algorithm  to 
the  conventional  autocorrelation  method  using  real  speech 
signals  in  low  SNR  environments  is  demonstrated. 


1.  Introduction 

Accurate  and  reliable  voiced/unvoiced  detection  of  a 
speech  signal  and  associated  pitdi  period  estimation  for  the 
voiced  part  are  crucial  prq)rocessing  steps  in  many  speech 
processing  applications  and  are  essentid  in  most  analysis 
and  synthesis  (vocoder)  systems.  These  include  automatic 
detection  of  the  b^inning  and  ending  of  an  utterance  in  a 
long  recording,  speech  segmentation  and  automatic  isolated 
word  recognition  (AIWR)  [1, 4,  7].  Many  algorithms  have 
been  rqxMted  in  the  literature  for  solving  the  detection  and 
estimation  problem  using  second  order  statistics  such  as  au¬ 
tocorrelation,  cq)strum  and  average  magnitude  diffamce 
function  (AMDI^  [1, 4,  5].  A  common  problem  with  these 
second  order  statistics  algorithms  is  that  they  are  sensitive  to 
various  noises.  Third  orda  statistics  have  been  shown  to  be 
particularly  insensitive  to  various  noises  such  as  Gaussian 
and  coloured,  sinusoidal  and  car  noise  [6,  9].  HOS  have 
beei  applied  in  [6]  to  speech  signals  for  pitch  detmnination 
using  autocorrelation  of  the  third  order  cumulants.  In  [7] 
HOS  have  beoi  used  for  aid  point  detection  of  a  speedi 


signal  by  using  the  maximum  singular  value  of  appropriate 
cumulant  matrix.  A  voiced/unvoiced  decision  in  the  fre¬ 
quency  domain  using  HOS  has  been  rqported  in  [8]  that 
uses  the  bispectrum  propoties  which  approximate  to  zero 
for  the  fricative  phonemes  and  a  complex  structure  for  the 
voiced  phonemes.  A  main  concern  in  using  HOS  in  practice 
is  the  excessive  computation  involved  in  its  estimation.  In 
this  paper  we  proposed  a  new  fast  and  robust  3-level  binary 
HOS  based  algorithm  for  simultaneous  voiced/unvoiced  de¬ 
tection  and  pitch  estimatian  of  speecdi  signals  that  can  work 
satisfactory  in  low  SNR  environments.  In  section  2  the  new 
algorithm  is  described.  In  section  3  the  simulation  results 
using  real  speecdi  signals  are  presented  and  its  performance 
is  (ximpared  to  the  cxmventional  second  orda  methods. 

2.  The  Algorithm 

The  blcxdc  diagram  of  the  3-level  binary  HOS  based  de¬ 
tection  and  estimation  system  is  shown  in  Fig.  1 .  The  speech 
signal  is  s^mented  into  ovalapping  30  ms  frames.  The 
system  uses  centa  clipping  and  infinite  peak  clipping  as  a 
non  linear  spectrum  flattening  on  the  speech  signal  [2,  3]. 
For  each  frame  a  clipping  threshold  is  computed  as  follows: 

Cl  =  Kmin[ci^,cif\  (1) 

whae  from  computa  simulations  an  appropriate  value  for 
K  is  found  to  be  .2,  c/j  and  ci^  are  the  maximxun  amplitude 
in  the  first  and  last  thfrd  of  the  frame  respectively.  Thus  a 
3-level  binary  speech  signal  is  produced  by  center  clipping 
and  infinite  peak  clipping  the  speech  signM  with  values  of 
—  1 , 0,  -f  1  depoiding  on  the  relation  of  the  original  speech 
sample  to  the  clipping  thresholds  as  follows: 

{1  if  s(n)  >  Cl 

-1  if  s(n)  <  -Cl  (2) 

0  otherwise 
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g(n) 


Figure  1 .  Block  Diagram  of  the  NAC C  System 


where  s{n)  is  the  speech  sample.  Since  the  1-d  slice  of  the 
third  order  ciunulants  is  defined  as: 


C3(r)  =  E[x{n)x{n)x{n  +  r)]  (3) 


each  combination  in  Eq.(3)  can  assume  the  following  3-level 
binary  values  as: 


r  0 


x{n)x{n)x{n  +  r) 


if{  x{n)  =  0, 
or  if  x{n  -f  r)  =  0} 
if  x{n  -|-  r)  =  1 
if  x(n  -f  r)  =  —  1 

(4) 


Thus,  a  simple  combinatorial  logic  circuit  is  only  required  in 
computing  each  term  in  the  third  order  cumulant  and  an  up- 
down  counter  to  accumulate  the  actual  third  order  cumulant 
value  of  Eq.(3).  The  3-level  binary  HOS  based  detection 
and  estimation  system  uses  a  normalized  autocorrelation 
function  of  the  1-d  slice  of  the  third  order  cumulants  NACC 
defined  as: 


NACC{t) 


El“o"3(n)c3(n  +  r)]' 


(5) 


The  numerator  and  the  daiominator  of  Eq.(5)  involves 
single  logical  operation.  To  simultaneously  detect  the 
voiced/unvoiced  region  and  the  associated  pitch  period  es¬ 
timation  for  each  firame  the  peak  value  of  the  N  ACC  is 
compared  to  a  threshold  as  shown  in  Fig.l.  If  it  is  a  voiced 
frame  the  pitch  and  its  period  are  estimated  directly  from 
the  positions  where  the  NACC  has  its  maximum  peaks. 


3.  Experimental  Results 

To  demonstrate  the  performance  of  the  3-level  binary 
HOS  NACC  system  for  simultaneous  voicedAmvoiced  de¬ 
tection  and  pitch  estimation,  the  utterances  of  'six'  is  used 
where  the  utterance  has  three  unvoiced/voiced  regions.  Ad¬ 
ditive  coloured  Gaussian  noise  of  5dB  and  OdB  SNR  are  used 
for  the  simulations  as  shown  in  Rg.2.  For  voicedAmvoiced 
detection  the  maximum  peak  of  the  NACC  in  Eq.(5)  is 
recorded  for  each  frame  as  shown  in  (a)  and  (c)  of  Fig.3  re¬ 
spectively.  From  the  figures  it  clear  that  a  level  close  to  zero 
signifies  an  imvoiced  region  while  a  significant  value  signi¬ 
fies  a  voiced  region.  From  the  voiced  region  the  pitch  is  si¬ 
multaneously  estimated  from  the  periodicity  of  the  A  ACC 
in  Eq.(5)  where  for  a  voiced  frame,  the  complete  NACC(t) 
from  that  frame  is  plotted  for  the  utterance  as  shown  in 
Fig.3(b).  Clearly  the  pitdi  period  and  location  can  be  simul¬ 
taneously  estimated  from  the  index  where  the  N ACC{t) 
takes  its  maximum  value. 

To  assess  the  performance  of  the  3-level  binary  HOS 
NACC  system  for  low  SNR  such  as  5dB  and  OdB  with  the 
conventional  second  orda:  statistics  (autocorrelation  auto) 
method  [2.  5],  the  voicedAmvoiced  r^ons  for  the  utter¬ 
ances  of  'six’  is  plotted  in  Fig.3(a),  (c)  respectively  and  for 
the  conventional  auto  method  in  Fig.3(d)  and  (e)  respec¬ 
tively.  Comparing  these  figures  to  (a)  (c)  of  Rg.3  we  can 
see  that  the  conventional  auto  method  has  failed  to  iden¬ 
tify  the  voicedAmvoiced  part  while  the  new  3-level  binary 
HOS  NACC  system  maintains  its  good  performance  in  the 
presence  of  a  high  level  noise.  The  use  of  the  normalized 
autocorrelation  in  the  new  system  works  better  than  the  di¬ 
rect  autocorrelation  since  it  accounts  for  the  non-stationarity 
in  the  speech  signal  [5].  This  will  reduce  the  possibility  of 
pitch  doubling  or  tripling  encountered  in  autocorrelation 
based  algorithms  due  to  more  similarities  in  these  lags  than 
that  of  the  pitch  period. 
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4.  Conclusions 


Fast  md  robust  3-level  binary  HOS  NACC  system  of  a 
speech  signals  has  been  described  for  accurate  and  reliable 
voiced/Unvoiced  detection  and  simultaneous  pitch  period 
Ktimation  for  the  voiced  part.  The  algorithm  can  easily  be 
implemented  in  digital  hardware  using  simple  combinatorial 
logic,  i.e.,  an  up-down  counter  can  be  used  to  compute 
cumulant  point.  The  performance  of  the  new  algorithm  has 
been  assessed  using  real  speech  signal  in  the  presence  of 
low  SNR.  The  robustness  of  the  NACC  algorithmhas  been 
demonstrated  and  compared  to  conventional  second  order 
algorithm  for  high  level  coloured  Gaussian  noise. 
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Figure  2.  (a)  Clean  Real  Speech  Signal  For  the 

Word  six,  (b)  With  5dB  Coloured  Noise, 
(c)  With  OdB  Coloured  Noise. 
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Figuie  3.  Slimilation  Results  for  the  word  'six' 

Using  NACC  With  OdB  Coloured  Noise. 

{n)NACC  V/U  Detection. 

fo)  Pit^  Estimation. 

lc)NACC  VAJ  Detection  With  5dB. 

(d)  AutoW/U  Detection  With  5dB. 
(eMutoV/U  Detection  With  OdB. 
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ABSTRACT 

This  paper^  addresses  performance  issues  in  the  source 
separation  problem.  By  drawing  on  the  theory  of  optimal 
statistic  matching,  we  derive  new  contrast  functions  which 
are  optimal  among  those  involving  a  given  set  of  cumulants. 
In  low  noise,  the  optimal  combination  of  a  particular  set  of 
cumulants  are  shown  to  be  parameter  independent  and  can 
be  pre-computed.  We  give  specific  exemples  in  close  form 
for  several  choices  of  2nd  and  4th  order  cumulants.  The 
resulting  performance  is  investigated  as  a  function  of  the 
SNR  and  of  the  non  gaiissianity  of  the  source  signals  and 
further  compared  to  suboptimal  approaches. 

1.  INTRODUCTION 

Source  separation  algorithms  assume  a  linear  model  for  a 
vector  x{t)  of  observations: 

x{t)  =  As{t)  4- n{t)  (1) 

where  matrix  ^  is  m  x  n  with  full  column  rank,  n{t)  is 
additive  noise  and  s{t)  is  a  vector  of  u  x  1  independent 
components,  si  (t), . . . ,  Sn(t):  the  so-called  ‘source  signals’. 

Source  separation  consists  of  recovering  the  source  signals 
and/or  estimating  the  ‘mixing  matrix’  A  without  using  a 
priori  information  about  the  latter.  In  this  paper,  we  focus 
on  approaches  based  on  cumulant  matching  and  on  contrast 
functions.  These  two  approaches  are  briefly  reviewed  below 

Contrast  functions  have  been  introduced  for  source  sep¬ 
aration  by  Comon  in  [1].  The  solution  to  source  separation 
is  defined  by  the  separating  matrix  B  such  that  its  output 
y  =z  Bx  shows  the  largest  possible  ‘contrast’.  For  instance, 
Comon  in  his  ICA  approach  [1]  suggests  to  maximize 

c(5)  =  ^  |Cum(t/„  y*,  y„  y*)\^  (2) 

i 

subject  to  Fyy^  =  /„  (The  constraint  must  be  modified  to 
take  noise  into  account).  A  similar  contrast  is  optimized 
by  the  joint  diagonalization  algorithm  described  in  [2]  as 
JADE. 


m  X  n  n{t)  n  x  m 
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Cumulant  matching  approaches  [3,  4]  to  source  separa¬ 
tion  are  a  specific  case  of  statistic  matching.  Denote  f 
a  vector  of  statistic  such  that  Eet  =  T($)  where  0  is  an 
unknown  vector  parameterizing  the  distribution  of  T.  An 
estimate  ^  of  ^  may  be  obtained  as  ^  =  arg  min^  c[6)  where 

c{0)  is  some  measure  of  discrepancy  between  f  and  T(0), 
like 

c{0)  =  {f-T{e))^w{f-T{e))  (3) 

with  W  a  positive  matrix. 

Contrasts  and  matching.  This  paper  draws  on  the  links 
between  these  two  approaches.  If  T  is  a  vector  of  sam¬ 
ple  cumulants  of  xt  and  the  unknown  parameter  0  can  be 
identified  with  matrix  A,  then  an  objective  like  (3)  may  be 
turned  into  a  contrast  function.  The  main  benefit  of  this 
perspective  is  that,  given  a  particular  set  of  cumulants,  the 
theory  of  optimal  statistic  matching  indicates  the  optimal 
weighting  to  apply  to  these  cumulants. 

This  paper  extends  the  work  presented  in  [5]  by  consid¬ 
ering  optimal  blind  estimation  of  the  mixing  matrix  from 
both  2nd  and  4th  order  cumulants  (ref.  [5]  considered  only 
4th  order  cumulants).  Including  2nd  order  statistics  is  im¬ 
portant  in  the  case  where  little  information  is  available  in 
higher-order  statistics.  It  is  also  more  robust  to  the  effects 
of  noise. 

2.  CUMULANT  MATCHING  AND 
CONTRASTS 

2.1.  Assumptions. 

To  keep  the  exposition  as  simple  as  possible,  we  assume  that 
the  noise  covariance  matrix  and  the  cumulants  of  orders  4, 
6  and  8  of  the  sources  are  known.  They  are  denoted 

kp  =  Cum(sp,s*,Sp,Sp),  (4) 

hp  =  Cum(sp,  Sp,  Sp,  Sp,  Sp,  s*),  (5) 

Op  —  Oum(sp,  Sp,  Sp,  Sp,  Sp,  Sp,  Sp,  5p),  (6) 

for  p  =  1,...  ,  n.  We  also  assume  that  n  ^  m  (the  case 

oi  m  >  n  can  be  handled  by  first  estimating  the  signal 
subspace,  which  has  little  effect  in  the  source  separation 
problem).  We  note  that  the  source  signals  may  be  assumed 
to  have  unit  variance: 

E\sp{t)\^  =  1  p  =  (7) 

because  the  amplitude  of  each  independent  component  can 
be  integrated  in  the  corresponding  column  of  A.  The  fol¬ 
lowing  moments  will  appear  in  the  sequel 

cxp  Op  T  8ftp  +  17A:p  +  20A;p  T  4,  (8) 


B  ^S/(t)  =  s{t) 
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7p  — 

Ap  =  /ip  +  4fcp.  (10) 

For  further  reference,  we  mention  that  ap  =  0  for  constant 
modulus  sources  (^.e.  lsp|“  =  1  a.s.). 

2.2.  Optimal  matching  and  contrast  functions. 

It  is  well  known  (see  for  instance  [3])  that,  under  the  ap¬ 
propriate  assumptions,  the  optimal  matrix  W  for  weighting 
in  (3)  is  the  inverse  of  the  (asymptotic)  covariance  of  the 

vector  of  statistics  T. 

kFopt  =  Cov,-'{f}.  (11) 

Let  then  fx  denote  a  vector  of  sample  cumulants  of  x  and 
consider  how  optimal  cumulant  matching  instantiates  in  the 
source  separation  case  where,  with  the  above  assumptions, 
the  unknown  parameter  is  the  mixing  matrix,  i.e.  0  =^1* 
The  optimal  way  of  matching  estimated  cumulants  Tx  to 
their  theoretical  values  Tx{A)  is  to  minimize 

c{A)  =  (f.  -  TAA)f  Cov^'  (f.)  (f.  -  T4A))  .  (12) 

We  now  make  a  key  step:  thanks  to  the  multi-linearity  of 
the  cumulants,  matrix  A  factors  out  to  some  extent  in  (13). 
As  a  matter  of  fact,  setting  B  =  A  ^  and  y  =  Bx,  crite¬ 
rion  (13)  may  be  rewritten  [6]  as 

clB)  =  {ty  -  Cov-'  (f.)  {fy  -  Ts)  .  (13) 

which  depends  on  B  via  the  random  vector  y  =  Bx  and  via 
the  random  vector 

2  s  +  Bn.  (14) 

The  net  result  is  that  an  optimal  criterion  measuring  cu¬ 
mulant  mismatch  at  the  array  output  {i.e.  for  the  r.v.  x) 
has  been  turned  into  a  contrast  function  measuring  the  mis¬ 
match  between  the  True’  cumulants  Ts  of  the  sources  and 
the  sample  cumulants  ty  estimated  at  the  output  y  =  Bx 
of  the  separator. 

The  beauty  of  this  manoeuver  is  that  for  high  enough 
SNR,  we  have  s  .s,  so  that  the  criterion  (13)  is  approxi¬ 
mately  equal  to: 

C(B)  =  {fy  -  T.)"  Cov-^(f.)  {ty  -  Ts)  (15) 

with  the  key  feature  that  the  optimal  weighting  matrix 
[ts)  does  not  depend  on  A:  it  is,  as  a  matter  of 
fact,  a  constant  matrix  which  can  be  evaluated  once  for  all 
for  a  given  distribution  of  the  sources. 

Further  analysis  is  possible  because,  thanks  to  the  as¬ 
sumption  of  independent  sources,  matrix  Cov(T!g)  has  a 
nearly  diagonal  structure  when  ts  is  a  vector  of  sample 
cumulants  [7].  It  follows  that  it  can  be  ‘manually  inverted. 
This  leads,  once  a  specific  set  of  cumulants  T  has  been  cho¬ 
sen,  to  simple  contrast  functions  in  which  cumulant  mis¬ 
match  is  weighted  on  a  statistically  sound  basis. 

3.  OPTIMAL  CONTRAST  FUNCTIONS. 

Some  examples  are  investigated  in  the  next  section  where 
we  consider  a  cumulant  statistic  t  containing  both  2nd  and 
4th  order  cumulants  (extending  the  analysis  of  an  earlier 


paper  [5]  where  T  could  include  only  4th  order  cumulants.) 
The  empirical  cumulants  of  vector  y  are  denoted 

fij  =  C\xm{yi,y*)  (16) 

qjl  =  Cvim{yi,y*,yk.yi)  (17) 

where  Cum  is  a  standard  cumulant  estimator. 

In  order  to  carry  out  a  detailed  investigation,  we  make 
the  following  simplifying  assumptions.  All  the  processes  are 
assumed  to  be  i.i.d.  and  circularly  distributed  ;  sources  have 
non-zero  kurtosis:  fcp  /  0  for  p  =  1, . . . ,  n;  the  noise  is  nor¬ 
mally  distributed,  independent  of  the  signals  with  covraince 
matrix  cr^/. 

On  this  basis,  we  consider  various  sets  of  cumulants.  We 
do  not  present  general  analytical  results  when  T  is  the  whole 
set  of  2nd  and  4th  order  cumulants  (except  in  sec.  3.1)  be¬ 
cause  we  prefer  to  focus  on  more  specific  cases  which  can 
be  detailed  and  because  room  is  lacking  for  an  exhaustive 
report.  For  the  same  reason,  we  leave  out  the  hard  core 
computations,  namely  explicit  inversion  of  Cov(r).  More 
details  will  be  found  in  [6]. 

3.1.  The  normal  limit 

When  the  sources  are  close  to  being  normally  distributed, 
our  analysis  leads  to  a  strikingly  simple  conclusion  because 
the  limit  form  of  Cov(r)  is  itself  very  simple.  The  optimal 
criterion  involving  all  2nd  and  4th  order  cumulants  is 

c{B)  =  + E 

pq  P<irs 

i.e.  the  mismatch  of  2nd  order  cumulants  receives  a  4  times 
heavier  penalty  than  the  mismatch  of  4th  order  cumulants. 
We  have  used  the  6  symbol  which  evaluates  to  1  when  all 
its  indices  are  equal  and  to  0  otherwise. 

Another  limit  case,  which  is  in  some  sense  complemen¬ 
tary  to  the  normal  limit,  is  when  the  sources  have  a  maxi¬ 
mally  low  kurtosis.  This  is  obtained  when  the  sources  have 
a  constant  modulus.  In  no  noise,  there  is  infinite  weight 
on  the  auto-cumulant  terms  as  well  as  on  those  containing 
cross-cumulants  of  the  form  ,  and  on  certain  linear 
combinations  of  the  2nd  and  4th  order  cumulants  rij ,  qij 
and  It  would  be  interesting  to  determine  if  the  CMA 
criterion  which  involves  only  2nd  and  4th  order  moments 
and  is  super-efficient  in  the  constant  modulus  case,  could 
be  obtained  as  the  limit  of  an  optimally  weighted  criterion 
involving  a  specific  subset  of  2nd  and  4th  order  cumulants. 

3.2.  Autocumulants 

Matching  only  au^o-cumulants  is  to  take 

rfi  r-l  -2  -n  "11  "22 

Ty  =  [ri ,  r2, . . . ,  ,  ^22  j  •  •  • )  ^nnj> 

Ts  —  [l ,  1 ,  .  .  .  ,  1 ,  ^1  j  ^2  )  ♦  •  ‘ 

The  best  criteria  based  on  these  cumulants  turns  out 
(maybe  not  surprisingly)  to  be  a  sum  of  criteria,  each  term 
being  concerned  with  a  particular  output,  i.e.  Cauto(^)  = 
Ep=i  cp(^) 

Cp{B)  =  {kp  +  1)  l?pp  —  fepi  +  ttp  |fpp  —  1] 

-  2Ap  (fpp  -  1)(?^^  -  kp)  (19) 

which  can  also  be  written  as  sums  of  squares: 

Cp(J5)  =  \^p{qpp  *“  kp)  -  Cp(^pp  1)1*^  +  Ppl'^pp  M  (•^^^) 
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where  +  1,  (?,  =  Xp/^p  and  p,,  =  (Xp  -  Cp.  Again,  if 

the  source  distributions  are  close  to  normal,  then  all  cumu- 
lants  are  close  to  0  and,  according  to  (8),  a  ^  4.  Thus,  we 
have  in  the  normal  limit 

V 

Cau.o(5)  =  ^4|7Vp  -  1|^  +  \qPP  -  kpf  (21) 

P=1 

where  the  coupling  between  2nd  and  4th  order  cumulant 
estimates  has  disappeared. 

3.3.  All  2nd  and  auto-4th  order  cumulants 

This  criterion  is  interesting  because  it  relates  to  Comon  cri¬ 
teria  in  that  the  same  set  of  cumulants  are  used,  except  that 
these  are  combined  in  an  optimal  way.  The  performance 
relationship  between  the  two  is  illustrated  in  the  following 
section.  The  criteria  itself,  optimally  involving  the  whole 
2nd  order  information  and  only  the  auto  4th  order  cumu¬ 
lants  is 

n 

C2  +  aMto4(^)  =  Cauto(-B)  -f-  ^  ^  (^2) 

It  is  seen  that  the  cross-correlation  terms  add  very  simply 
to  the  Cauto(j5)  criterion. 

3.4.  4th  order  cumulants  only 

The  case  where  the  whole  4th  order  cumulant  set 

fy  ~  Uij  I  1  <  <  72}  (23) 

is  involved  in  the  estimation  was  investigated  in  [5].  For 
two  identically  distributed  sources,  we  obtained 

ciB)  -  + 

"^(^+1)2  (A;  +2)2 

For  instance,  in  the  case  of  two  QAM  16  sources,  one  has 
k  =  —0.68,  h  ~  2.08,  o  =  —13-5184  (we  assume  that 
the  phase  of  these  constellations  is  randomized  and  we  re¬ 
call  that,  by  convention,  the  sources  have  unit  variance). 
The  optimal  criterion  based  on  4th  order  cumulants  for  the 
source  separation  of  two  QAM  16  signals  is  then  approxi¬ 
mately,  at  low  noise: 

C4(5)  =  O.T2{\q\\  +  0.f)8|-  +  |()i  +0.68|^)  +  9,77|gJ^|^ 

+  i  +  1.75(1?“!^  +  +  3.721?“  -  ill\\ 

This  shows  that  in  this  case  the  cross-cumulant  q\l  is  a 
more  reliable  measure  of  independence  than,  say,  qH. 

3.5.  Link  to  suboptimal  criteria 

Now  as  pointed  out  before,  a  number  of  algorithms  in 
the  literature  (ICA,  JADE)  can  be  interpreted  in  terms  of 
statistic  matching.  However  these  do  not  use  the  optimum 
weighting.  Rather  they  are  based  on  a  hard  prewhitening  in 
tlie  sense  that  the  (empirical)  covariance  matrix  of  the  sig¬ 
nals  at  the  output  of  the  separating  matrix  is  constrained 
to  be  exactly  the  identity  matrix,  leaving  no  room  for  an 
‘  appr oxim ate  decorrelation ’ . 

This  can  be  interpreted  as  the  weighted  statistic  matching 
in  which  virtually  infinite  weights  are  put  on  the  second 
order  statistic  terms  and  flat  weights  on  the  4th  order  terms. 
In  practice  the  limit  as  one  increases  the  weight  on  the 


second  order  terms  can  be  taken.  Results  in  the  next  section 
demonstrate  the  equivalence  of  this  weighted  statistic  with 
the  ICA/ JADE  contrasts. 

Use  of  this  suboptimal  weighting  results  in  a  performance 
loss  which  we  illustrate  in  the  next  section.  In  addition 
we  will  consider  the  flat  weighting  which  is  simply  statistic 
matching  with  equal  weights  applied  to  all  the  statistics. 

4.  ASYMPTOTIC  PERFORMANCE 

We  shall  be  using  the  interference  rejection  at  the  output 
of  the  separating  matrix  as  the  measure  of  performance  in 
our  analysis.  The  relevant  figure  of  merit  is  the  ISI  (inter 
symbol  interference)  which  is  defined  pairwise  between  two 
sources  p  and  q  as  the  ratio  of  the  power  of  source  j  to 
that  of  source  p  at  the  channel  output  corresponding  to  q\ 
since  this  is  proportional  to  1/A  where  N  is  the  number  of 
samples,  we  shall  really  be  considering  the  ISI  rate,  given 
by  A  X  ISI  in  the  subsequent  expressions  and  plots. 

Indeed  the  performance  can  be  expressed  in  terms  of 
the  perturbation  of  the  global  system  from  the  identity, 
=  /  +  ^.  Then  ISIp^  =  E|<fpq|^  and  this  may  be 
computed  from  the  covariance  of  S  which  itself  is  given  by 

where  D  is  the 

derivative  dTjdE. 

Two  performance  bounds. 

Denote  pp^  the  (p,  q)  entry  of  matrix  and  let 

a  be  the  noise  power.  Any  source  separation  using  hard- 
whitening  has  a  pair-wise  lower  bounded  rejection  rates  [8]. 
We  call  this  the  ‘pre-whitening  bound’.  For  n  =  m,  it  is 

ISIpq  +  ISI^p  >  ~{1  +  crppp)(l  +  apgg)  (25) 

Another  bound  is  provided  by  computing  what  would  be 
the  ISI  if  matrix  A  was  identified  knowing  the  source  signals. 
This  is  the  so-called  I/O  (input/output)  bound.  It  is: 

ISIp<j!  ^  (X ppp  (2^) 

Some  numerical  evaluaions  are  given  below  for  illustra¬ 
tion..  They  are  computed  for  a  2times2  matrix  A  =  [ai,  ao] 
such  that  |ai|  =  \a2\  and  with  the  values  of  pij  indicated 
on  the  plot.  We  use  QAM4  and  QAM  16  distributins  with 
a  randomized  phase  (to  ensure  circularity). 

Optimal  contrasts.  Figure  1  for  the  case  of  two  identical 
sources  shows  the  effect  of  strictly  increasing  information 
as  we  go  from  the  optimal  criterion  using  the  4th  order 
cumulants  to  that  involving  entire  second  and  4th  order  cu¬ 
mulants  to  the  input-output  bound.  The  optimal  criterion 
involving  aU  the  second  order  and  the  4th  order  auto  cumu¬ 
lants  is  also  shown.  We  note  that  for  QAM4  at  good  SNR, 
optimal  matching  of  2nd  and  4th  order  cumulants  is  close 
to  I/O  performance  (there  is  a  ratio  of  2  in  terms  of  ISI);  in 
this  situation  includind  2nd  order  information  seems  crucial 
for  good  performance  (see  how  the  curve  for  4th  order  lev¬ 
els  off  at  increasing  SNRs).  These  conclusions  do  not  apply 
to  the  QAM  16  case  (there  is  clearly  a  ‘constant  modulus 
effect’  here). 

Suboptimal  contrasts  The  next  figure  2  indicates  the 
performance  of  actual  JADE/ICA  contrast  based  algo¬ 
rithms  compared  to  the  suboptimum  criterion  employing 
flat  weighting.  The  bound  for  pre- whitening  is  included  for 
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Figure  1.  ISI:qam:44-4:16H-16: 


Figure  2.  151:4+4:16+16 

reference.  We  note  that  the  use  of  the  optimum  weights  in¬ 
stead  of  the  hard  weights  for  pre-whitening  overcomes  that 
bound.  Moreover  the  actual  performance  of  the  above  algo¬ 
rithms  can  be  dominated  by  this  effect  in  high  SNR,  a  point 
which  is  more  evident  in  the  upper  panel  for  two  QAM4 
sources. 

Effect  of  source  distributions  We  round  off  this  discus¬ 
sion  with  an  illustration  of  the  effect  on  performance  as  the 
source  distribution  varies  from  the  constant  modulus  limit 
to  the  Gaussian  limit.  Figure  3  illustrates  the  variation  of 
ISI  rate  as  we  start  with  two  QAM4  sources  and  make  them 
progressively  more  Gaussian  by  adding  a  Gaussian  compo¬ 
nent  to  the  source  with  a  relative  amplitude  of  t,  which 
can  then  be  treated  as  a  ’’Gaussianity  parameter”.  Note 
how  the  pre-whitening  loss  constrains  the  performance  of 
the  vTADE/ICA  algorithms  at  small  values  of  i  while  the 
optimum  4th  order  criterion  does  uniformly  worse.  Further 
note  the  flattening  of  the  optimum  2+4  curve  near  the  CM 
limit  and  the  transition  near  t  =  I  corresponding  to  the 
Gaussian  component  effectively  smearing  out  the  discrete 
nature  of  the  distribution.  On  the  other  hand,  we  see  that 
ICA/JADE  contrasts  do  as  well  as  optimal  matching  of  2nd 


and  4th  order  cumulants  as  the  distribution  of  the  sources 
gets  close  to  normality. 


ISI  12  as  two  identical  QAM4  sources  become  increasingly  Gaussian 


Figure  3.  ISkToGauss.eps 
CONCLUSION 

This  paper  develops  the  link  between  contrasts  for  source 
separation  and  criteria  based  on  optimal  statistic  matching. 
A  number  of  optimal  and  sub-optimal  criteria  are  proposed, 
studied  and  compared.  The  ICA  contrasts  of  Comon  and 
of  Cardoso  are  investigated  within  the  same  framework;  it 
is  seen  from  the  examples  that  the  primary  cause  of  per¬ 
formance  loss  of  those  algorithms  relative  to  the  optimum 
criteria  at  high  SNR  is  caused  by  the  hard  pre-whitening. 
The  effect  of  source  distributions  on  performance  is  also  il¬ 
lustrated;  we  find  in  particular  that  JADE/ICA  constrasts 
are  very  suboptimal  for  constant  modulus  sources  but  tend 
to  be  optimal  as  the  source  distributions  are  pulled  from  this 
limit  case.  More  illustrative  examples  will  be  presented  at 
the  workshop. 
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Abstract 

In  this  paper,  an  algorithm  using  the  well-known 
notch  filter  and  an  algorithm  using  a  peak  filter  are 
proposed  to  estimate  the  frequencies  of  sinusoidal  sig¬ 
nals  with  a  given  set  of  Gaussian  noise  corrupted  mea¬ 
surements  y{n)  provided  that  the  number  of  sinusoids  is 
known  in  advance.  The  former  processes  y{n)  such  that 
a  single  fourth-order  cumulant  of  the  notch  filter  out¬ 
put  is  minimum  in  absolute  value,  while  the  latter  pro¬ 
cesses  y(n)  such  that  the  same  fourth-order  cumulant 
of  the  peak  filter  output  is  maximum  in  absolute  value. 
Then  the  unknown  frequencies  are  obtained  from  the 
optimum  notch  filter  and  the  optimum  peak  filter,  re¬ 
spectively.  A  performance  analysis  of  the  proposed  two 
algorithms  is  then  presented  followed  by  some  simula¬ 
tion  results  for  a  performance  comparison  of  the  pro¬ 
posed  algorithms  and  Swami  and  MendeVs  SVD  low- 
rank  approximation  method. 


1.  Introduction 

Estimation  of  parameters  of  sinusoidal  signals  is  a 
problem  to  estimate  frequencies  0  <  <  tt  and  ampli¬ 

tudes  Ai  >  0  with  a  given  set  of  noisy  measurements 
modeled  as  follows: 

p 

yip)  =  Th  A  -t  -t-  w{n)  (1) 

1  =  1 


(second-order  statistics)  based  algorithms  reported  for 
the  estimation  of  cd^’s  such  as  Pisarenko’s  harmonic 
decomposition  procedure  [1],  Tufts  and  Kumaresan’s 
method  [2],  over  determined  Yule- Walker  method  [3] 
and  maximum-likelihood  method  [4].  Chicharo  and  Ng 
[5]  proposed  an  adaptive  notch  filtering  approach  for 
the  enhancement  and  tracking  of  sinusoids  in  additive 
noise.  The  transfer  function  of  notch  filters  (IIR  filters) 
of  order  equal  to  2p  is  given  by 

TT  /  \  _  nLi(^  +  PaiZ~^  -h 

where  0  <  <  1  and  0  <  a  <  The  Wi’s  are  obtained 

by  solving  roots  of  the  numerator  polynomial  of  the 
adaptive  notch  filter. 

Higher-order  (>  3)  statisticSj  known  as  cumulants, 
have  been  used  for  frequency  estimation  of  sinusoidal 
signals  when  measurement  noise  is  Gaussian  because 
all  higher-order  cumulants  of  Gaussian  noise  are  equal 
to  zero.  Thus  cumulant  based  frequency  estimation 
algorithms  [6-8]  are  insensitive  to  additive  Gaussian 
noise.  In  this  paper,  the  notch  filter  and  a  peak  filter, 
using  a  single  fourth-order  cumulant  are  proposed  for 
frequency  estimation  of  sinusoidal  signals.  A  perfor¬ 
mance  analysis  of  the  proposed  frequency  estimation 
algorithms  (one  using  the  notch  filter  and  the  other 
using  a  peak  filter)  is  presented  followed  by  some  sim¬ 
ulation  results. 


where  p  is  the  total  number  of  sinusoids,  <^i’s  are  ran¬ 
dom  phases  and  w{n)  is  additive  noise.  This  is  a 
well  defined  problem  in  some  statistical  signal  pro¬ 
cessing  areas  such  as  noise  and  interference  cancel¬ 
lation  and  estimation  of  direction  of  arrival  (DO A) 
of  narrowband  source  signals  in  sonar  and  radar  ar¬ 
rays.  Usually,  frequency  estimation  is  followed  by  am¬ 
plitude  estimation  because  the  former  often  resorts  to 
a  nonlinear  search  procedure  while  the  latter  can  be 
solved  from  a  set  of  linear  equations  once  cd^’s  are  es¬ 
timated.  There  have  been  a  number  of  correlation 

This  work  is  supported  by  the  National  Science  Council  un¬ 
der  Grant  NSC  85-2213-B-007-012. 


2,  Cumulant  based  harmonic  retrieval 
using  notch  filters  and  peak  filters 

Assume  that  we  are  given  a  set  of  noisy  measure¬ 
ments  2/(n),n  =  0, 1,  •  •  •  jiV  —  1  modeled  by  (1)  under 
the  following  assumptions: 

(Al)  The  number  p  of  sinusoids  is  known  a  priori] 
amplitudes  Ai  >  0  and  frequencies  0  <  < 

TT,  i  =  1,  •  •  • ,  p  are  unknown. 

(A2)  Measurement  noise  w{n)  is  Gaussian  with  un¬ 
known  statistics. 
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(A3)  Phase  are  i.i.d.  random  variables  with  a  uni¬ 
form  probability  density  function  over  [-tt,  tt) 
and  they  are  statistically  independent  of  w{n). 

Let  denote  the  Mth-order  cu- 

mulant  function  of  a  non-Gaussian  signal  e(n).  We 
need  the  following  proposition  on  which  the  two  fre¬ 
quency  estimation  algorithms  to  be  presented  are 
based. 

Proposition  1.  Let  e(n)  be  the  output  of  a  linear 
time-invariant  system  H{z)  with  input  y{n)  given  by 
(1)  under  the  assumptions  (Al)  through  (A3),  i.e., 

oo 

e(n)  =  y{n)  *  h(n)  —  ^  h{k)y{n  -  k)  (3) 

fcr:  — OO 

where  h{n)  is  the  impluse  response  of  the  system.  Then 

C4,e(0,0,0)  =  -^^AM/f(e^“>)r  (4) 

i=l 

A.  Notch  filter  based  algorithm: 

By  Proposition  1,  one  can  infer  the  following  fact: 

(FI)  Let  e(n)  be  the  output  signal  given  by  (3)  of  the 
notch  filter  Hp{z)  with  p  =  I  given  by  (2).  Then 
|C4,e(0, 0,0)1  =  mm{|C4,e(0, 0,0)1}  ^  0  occurs 
only  when  \Hp{e^^*)\  ~  0  for  all  z,  i.e., 

a,-  -  -2  •  cos{u)i)  (5) 

Let  C4,e(0,0,0)  denote  the  fourth-order  sample  cu- 
mulant  associated  with  (74,6(0,0,0).  By  (FI),  we  pro¬ 
pose  the  following  frequency  estimation  algorithm: 

Algorithm  1: 

(51)  Let  e{n)  be  the  output  signal  given  by  (3)  of  the 

notch  filter  Hp{z)  {p  =  ^)  given  by  (2).  Find 
the  optimum  parameters  Sf , z  =  ^  ^  of  Hp{z) 

such  that  |C4,e(0, 0,0)1  is  minimum. 

(52)  Obtain  Qi  by  (5),  i.e., 

oJj  arccos{—ai  /2)  (6) 


B.  Peak  filter  based  algorithm: 

The  peak  filter  used  for  frequency  estimation  is  an 
HR  filter  with  transfer  function 


_  nLi  (l  +  paaiZ 

Vp(^)-  YlUiii  +  aaiZ-^+a^z-^) 


(7) 


where  0  <  a  <  1  and  0  <p<  1-  The  peak  filter  differs 
from  the  notch  filter  in  that  each  pair  of  complex  con¬ 
jugate  poles  (with  magnitude  a)  are  closer  to  the  unit 
circle  than  the  associated  pair  of  complex  conjugate 
zeros  (with  magnitude  ap  <  a  ). 

Again,  by  Proposition  1,  one  can  infer  the  following 
fact: 


(F2)  Let  e(n)  be  the  output  signal  given  by  (3) 
of  the  peak  filter  l^(z)  given  by  (7).  Then 
1(74,6(0,0,0)1  =  max{l(74,e(0, 0,0)1}  occurs  when 
Ui,  i’  =  1,  •  •  •  ,p  of  Vp{z)  are  given  by  (5). 

The  following  frequency  estimation  algorithm  is  due  to 
(F2): 

Algorithm  2: 

(51)  Let  e(n)  be  the  output  signal  given  by  (3)  of  the 
peak  filter  Vp{z)  given  by  (7).  Find  the  opti¬ 
mum  parameters  ai,i  =  l,  -,p  of  Vp{z)  such 

that  lC'4,e(0, 0,0)1  is  maximum. 

(52)  Obtain  Qi  using  (6). 


To  find  the  optimum  a,  required  in  (SI)  of  the  pro¬ 
posed  two  algorithms,  we  have  to  resort  to  iterative 
optimization  algorithms  because 


a4,e(0, 0,0)  =  ^^  e\n)  -  3  (  ^  ff  e^{n)  ]  (8) 

n=:0  \  n=0  / 

is  a  highly  nonlinear  function  of  Uj.  A  gradient  type 
iterative  algorithm  is  used  to  search  for  the  optimum 
a  =  (ai,  •  • ,  Up)^.  At  the  nth  iteration,  a  is  updated 

hy 

-r  ^  -r  It  .4-  5|54,e(0, 0,0)1 

a(n)  a(n  —  1)  ±  zy- 


JV-l 


a=a(n-i) 


(9) 


where  zy  is  a  small  positive  constant  and  — ”  is  for  Al¬ 
gorithm  1  and  ''-f  ”  is  for  Algorithm  2,  respectively. 
An  initial  condition  for  a(0)  is  needed  to  initialize  the 
iterative  algorithm  given  by  (9).  Swami  and  Mendel’s 
method  [6]  can  be  used  to  obtain  an  estimate  for  each 
iOi  and  the  associated  ai  computed  by  (5)  can  be  used 
for  a(0). 


3.  Performance  analysis 


To  illustrate  the  performance  of  the  proposed  two  fre¬ 
quency  estimation  algorithms,  let  us  assume  that  p  =  I, 
Al  =  l^LOx  =  O.Stt  and  w{n)  is  white  with  variance  cr^. 
Then 


|C'4.e(0,0,0)|-  I 


(3/8)|/fi(e^“0l'' 

(378)11^1(6^^^01^ 


for  Algorithm  1 
for  Algorithm  2 


with  the  same  optimum  solution  ai  =  a  =  0  by  (5). 
Figure  1  (a)  shows  /o5fio|C4,e(0, 0, 0)|  associated  with 
the  peak  filter  used  by  Algorithm  2  for  p  =  0.9  and 
a  =  0.9  (dashed  line),  0.95  (dotted  line)  and  0.99  (solid 
line),  respectively,  and  Figure  1  (b)  shows  1(74,6(0, 0, 0)| 
instead  of  /oflfio|C4,e(0,  0,  0)|  associated  with  the  notch 
filter  used  by  Algorithm  1  for  /?  =  1  and  a  =  0.9 
(dashed  line),  0.95  (dotted  line)  and  0.99  (solid  line), 
respectively.  One  can  see,  from  these  two  figures,  that 
a  single  peak  (whose  magnitude  is  larger  for  larger  a) 
in  Figure  1  (a)  and  a  single  notch  (|(74,e(0, 0,0)[  =  0) 
in  Figure  1  (b)  located  at  a  =  0  are  associated  with 
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each  curve,  and  that  the  larger  a,  the  narrower  is  the 
peak  for  the  former  and  the  notch  for  the  latter. 

It  can  be  shown  that 

C4,e(0,  0,  0)  ^  C4,e(0,  0,  0)  +  C4,^/(0,  0,  0)  (10) 


where  C4^u;/(0, 0, 0)  is  the  fourth-order  sample  cumu- 
lant  of  the  Gaussian  noise  w\n)  in  the  filter  output  e(n) 
due  to  the  presence  of  w(n).  Note  that  C4,ty/(0,  0, 0)  it¬ 
self  is  a  random  variable.  For  the  notch  filter,  it  can  be 
shown  that  for  a  =  0 


,(0,0,0)]>cr2  = 


N 


n  2 


l-~N 


where  (autocorrelation  function  of  u;'(n))  is  given 

by 


(0 


2  (of^  — l)Q'l*hos(/7r/2) 

‘  (1  +  q;2)q,2 


1^0 


Therefore,  mm{|C4^e(0,  0,  0)|}  =  0  is  easily  smeared  by 

C’4,iy'(0,  0,  0)  if  (Ji  ^  0  ( low  SNR).  On  the  other  hand, 
for  the  peak  filter,  it  can  be  shown  that  for  a  =  0 

E[Cl^,{0, 0. 0)]  <  =  1050  •  f  1  + 

One  can  easily  infer  that  if  max{|(74^e(0, 0,  0)|}/cr2  = 
(3/8)|Vi(e-^®'^’^)|^/cr2  ^  1,  the  optimum  a  =  0  can  be 
accurately  estimated  even  if  SNR  is  low.  For  instance, 
max{|C4,e(0,0,0)|)  =  4316  >  (T2  =  28.6  for  SNR  = 
0  dB,  p  =  0.9  and  a  =  0.99.  Therefore,  the  previous 
performance  analysis  leads  to  following  fact: 


(F3)  Algorithm  2  outperforms  Algorithm  1  for  fi¬ 
nite  data,  because  the  former  is  more  robust  to 
additive  noise  than  the  latter. 


4.  Simulation  results 


and  SNR  =  0,  5,  10,  15,  20  dB.  From  this  ta¬ 
ble,  one  can  see  that  Algorithm  2  performs  best,  SM 
method  performs  second  and  Algorithm  1  performs 
worst.  On  the  other  hand.  Table  2  shows  the  corre¬ 
sponding  results  for  p  =  2,  Ai  =  A2  =  1,  /i  =  0.1  and 
/2  =  0.2.  From  Table  2,  one  can  see  that  Algorithm 
2  performs  best  except  for  the  case  that  SNR  =  0  dB 
when  N  —  1024  and  2048  while  SM  method  performs 
best  for  this  case.  These  simulation  results  indicate 
that  the  latter  may  perform  better  than  the  former  for 
small  N  and  low  SNR.  However,  Algorithm  1  always 
performs  worst  as  predicted  by  (F3),  and  its  perfor¬ 
mance  for  low  SNR  may  not  improve  even  when  N  is 
increased  (see  the  results  for  N  =  2048  and  4096  when 
SNR  =  0dB,b  dB  and  10  dB  in  Table  2).  The  rea¬ 
son  for  this  is  that  although  N  was  doubled,  the  notch 
of  mm{|C4,e(0, 0,0)1}  =  0  in  some  realizations  was 
severely  smeared  by  C4,«;/(0, 0, 0)  «  G4,e(0,0,0)  at  the 
vicinity  of  (ai,a2)^  “  (-2cos(0.27r),  -2cos(0.47r))^ 
where  w\n)  was  the  Gaussian  noise  in  the  notch  filter 
output  due  to  measurement  noise  w(n). 

5.  Conclusions 

We  have  presented  two  frequency  estimation  algo¬ 
rithms  with  a  given  set  of  noisy  sinusoidal  signals  un¬ 
der  the  three  assumptions  (Al)  through  (A3).  Al¬ 
gorithm  1  uses  the  notch  filter  and  Algorithm  2 
uses  the  peak  filter,  while  the  former  tries  to  minimize 
but  the  latter  tries  to  maximize  the  same  single  abso¬ 
lute  fourth-order  cumulant.  A  performance  analysis  for 
the  proposed  two  algorithms  was  also  presented.  Then 
some  simulation  results  obtained  by  the  proposed  two 
algorithms  and  Swami  and  MendePs  method  were  pre¬ 
sented  for  a  performance  comparison.  The  presented 
simulation  results  support  that  Algorithm  2  performs 
best  for  the  case  of  p  1=:  1,  but  for  the  case  of  p  =  2  it 
performs  best  except  that  when  N  is  small  and  SNR 
is  low,  Swami  and  Mendel’s  method  performs  best. 


As  mentioned  in  Section  2,  SM  method  [6]  was  used 
to  provide  an  initial  condition  for  the  proposed  two  fre¬ 
quency  estimation  algorithms.  In  the  simulation,  thirty 
independent  runs  were  performed  to  compute  the  mean 
square  error  (MSE)  defined  as 


.  30  p 

*^^®=3oE<E(A-/.«  (11) 


j  =  l  i=l 


where  fi  —  and  is  the  obtained  estimate  for 

fi  at  the  j th  run.  Two  sets  of  simulation  results  (p  =  1 
and  p  =  2,  Al  =  A2)  for  measurement  noise  w{n)  as¬ 
sumed  to  be  white  Gaussian  were  obtained  using  Al¬ 
gorithm  1  with  /?  =  1  and  a  =  0,99  and  Algorithm 
2  with  p  =  0.9  and  a  =  0.99,  respectively. 

Let  SNR  =  A^f{2a^)  where  cr^  is  the  variance 
of  w(n).  Table  1  shows  the  simulation  results  for 
p  =  1,  Al  =  1,  /i  =  0.2,  N  =  1024,  2048,  4096 
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Figure  1.  (a)  logiolC4  e(0, 0, 0)|  associated  with  the  peak  filter  for  p  =  0.9  and  a  =  0.9  (dashed  line) ,  0.95  (dotted 

line)  and  0.99  (solid  line),  respectively;  (b)  |C'4,e(0, 0, 0)|  associated  with  the  notch  filter  for  /?  =  1  and  a  =  0.9 
(dashed  line),  0.95  (dotted  line)  and  0.99  (solid  line),  respectively. 
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Table  1.  M5£”s  associated  with  the  SM  method,  Al¬ 
gorithm  1  (using  the  notch  filter)  and  Algorithm  2 
(using  the  peak  filter)  for  p  =  1  and  /i 


0.2. 


Table  2.  MSE^s  associated  with  the  SM  method,  Al¬ 
gorithm  1  (using  the  notch  filter)  and  Algorithm  2 
(using  the  peak  filter)  for  p  =  2,  /i  =  0.1  and  /2  =  0.2. 
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Abstract 


Estimates  of  Higher  Order  Statistical  quantities  (such  as 
the  bicoherence)  have  higher  variances  than  their  second- 
order  counterparts.  Reliable  estimates  can  be  obtained  by 
using  longer  data  records,  but  in  practice  this  is  often  not 
possible.  In  direct-method  bicoherence  estimation,  estim¬ 
ates  from  shorter  records  can  be  highly  dependent  on  meas¬ 
urement  errors  and  background  noise.  To  try  to  get  around 
these  problems,  a  new  bicoherence  measure  based  on  the 
a— trimmed  mean  bispectrum  is  described.  Simulations  in¬ 
dicate  how  well  this  new  measure  performs  compared  to  the 
standard  bicoherence  measure. 

1  Introduction 


The  disCTete  bispectnun  of  a  discrete,  stationary, 
stochastic  process  x(n)  can  be  estimated  using  a  segment- 
averaging  approach  [4];  the  signal  x{n){n  =  1,  ..,N)  is 
divided  into  K  non-overlapping  segmaits  (m  =  1, K), 
each  of  laigth  IV/j  f  T  (N  =  NdftK).  The  iVopx~point 
DFT  Xm(^)  is  computed  in  each  segment  m,  and  the  bis¬ 
pectrum  is  estimated  using 

Hki)  = 

m  which  Yj  =  5Zm=r  stands  the  variance  of  this 
^timate  is  dififorent  in  each  bifrequency  bin  (fc,  /)  [3].  The 
variance  can  be  (approximately)  flattened  by  normalising 
the  bispectrum  to  form  the  squared  bicoherence  b^(k,  1)  [4] 


in  which 


b\k,l)  = 


I  mo  1^ 
m,om+o’ 


(2) 


s(k,i)  = 

=  jJ2\^rnik)X,r.H)\\ 
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m+0  =  ;^E^-(^+o 

=  +  +  (3) 

are  the  denominator  components  and  again  Y  =  Ym~i  • 
The  methods  used  in  this  paper  are  equally  applicable  to 
other  bispectrum  normalisations,  and  similar  results  (not 
shown)  have  bem  achieved  for  the  skewness  function. 

Previous  applications  of  the  bicohermce  for  Quadratic 
Phase  Coupling  (QPC)  detection  [4,  1]  have  consida-ed 
coupled  sinusoids  in  white  Gaussian  background  noise,  but 
thearehas  been  no  investigation  into  how  well  the  bicoherence 
detects  QPC  if  the  badcground  noise  includes  disturbances 
sudi  as  transients.  Furthermore,  previous  analyses  have  typ¬ 
ically  used  long  data  records  such  that  K  sa  No  ft  .  but  in 
practical  applications  the  data  length  N  (and  thus  K)  may 
be  limited. 

Undo"  these  more  demanding  conditions,  new  problems 
can  arise.  Although  the  bispectral  estimate  (Eqn.  1)  is 
asymptotically  ramplex  normd  [3],  if  AT  is  small  the  dis¬ 
tributions  of  [5(1;,  /)]  and  ^[B(k,  /)]  may  be  non-normal. 
Furthermore,  bispectral  estimates  from  short  records  are 
small-sample  estimates  of  large-variance  quantities,  and  so 
occasional  large  values  (possibly  due  to  estimation  orors, 
or  to  external  transients)  can  exert  a  strong  influence  over 
the  bispectral  estimate.  In  other  words,  the  distributions  of 
bispectral  estimates  based  on  small,  noisy  samples  may  have 
long  tails,  and  so  bispectral  averages  formed  using  the  mean 
estimator  (as  in  Eqn.  1)  may  be  suscq)tible  to  outlim. 

The  new  method  developed  in  this  papa  is  based  on 
forming  a  bispectral  estimate  without  using  the  values  in 
the  tails  of  the  distribution.  Obviously  this  will  reduce  the 
variance  of  the  estimate,  and  in  the  case  where  the  sources 
of  error  described  above  are  small  the  new  estimate  will  be 
worse  than  the  raw  estimate.  However,  in  cases  where  the 
sources  of  error  are  significantly  large  (and  this  can  often 
be  gleaned  from  inspection  of  the  time  series  and  powo- 
spectrum)  the  new  method  can  result  in  inqrroved  estimates. 

Tliek^  assumption  in  this  new  method  is  that  the  sources 
of  error  described  above  influence  the  bispectral  estimate  in 
a  small  number  of  segments  only.  i.e.  that  extreme  bispec- 
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tral  values  (due  to  either  measurement  errors  or  transients) 
occur  only  in  a  small  numba-  of  segments.  By  excluding 
these  it  is  hoped  that  the  resulting  bispectral  estimate,  and 
subsequent  bicoherence  estimates  will  be  more  robust.  Pre¬ 
vious  applications  of  robust  tedmiques  to  HOS  have  been 
limited  to  time  domain  parameta  estimation  problems  [5]. 

2  An  a-trimmed  mean  estimator  for  the  bis¬ 
pectrum 

The  st^s  in  the  conq)utation  of  the  a— trinuned- 
mean  bispectrum  estimate  will  now  be  described.  These  are 
based  on  the  a— trimmed  mean  algorithm  described  in  [6]. 
The  a— trimming  is  applied  to  the  real  and  imaginary  parts 
of  the  bispectral  estimate  separately,  because  a-trimming 
is  only  appropriate  on  signals  whidi  are  symmetrically  dis¬ 
tributed  [6]  (For  this  reason  it  cannot  be  applied  directly 
to  the  bicohermce  estimate).  The  algorithm  is  described 
below. 

1.  Divide  the  time  series  ar(n)(n  =  1,  ...N)  into  K  seg¬ 
ments  (for  clarity  it  is  assumed  that  tho'e  is  no  ova"- 
lapping  of  the  frames). 

2.  Conq)ute  the  raw  bispec:tral  estimates  Bm{k,l){m  — 
1,  ..,frr)(seeEqn.  1). 

3.  For  each  (fc,  1)  form  two  vectors  r  =  [ri, and 

i  =  [ii,  each  containing  K  integers.  Each 

int^er  in  r  (or  i)  identifies  a  segment  m,  and  hoice  a 
value  of  9J[B„,(fc,/)]  Iheint^o-s 

in  r  and  i  are  arranged  so  that 

3fi[Br,(fc,0]  =  i^9i[Bm(l;,0]. 
3i[BrK(^.0]  =  max3J[Sm(A:,/)]. 

m 

9[Bi,(fc,/)]  =  mmQ[Bm(k,l)], 

m 

=  max9[B^(fc,/)].  (4) 

m 

Note  that  the  ordaing  is  done  sq)arately  for  real  and 
imaginary  parts,  r  and  i  thus  detOTnine  the  order 
statistics  [6]  of  the  real  and  imaginary  parts  of  the  raw 
s^mental  bispec:trum  estimates. 

4.  The  a-trimmed  mean  estimate  at  a  particular  bifre- 
quQicy  {k,l)  is  thai  evaluated  as  the  sum  of  the 
a-trimmed  real  and  imaginary  parts. 


K{1  -  2a)  \ 

(l-r)[3i[B,,^,-fB,^_J-h 

+  -SiK-g]] 

E  (5) 

m^g+2  ) 


where  g  is  the  largest  integer  less  than  or  equal  to 
aK,  r  =  aK  -  and  the  {k,  1)  has  been  dropped 
for  clarity.  Eqn.  5  is  a  summation  over  the  segments 
idmtified  by  the  middle  K  -  2g  values  of  r  and  i  (i.e. 

5.  Thisestimateisthusformedforallbifrequencies(fc, /) 
of  interest. 

This  estimate  is  based  on  the  absolute  values  of  the  real 
and  imaginary  parts  of  the  bispectrum  estimates.  It  discards 
the  contributions  to  5  of  a  fraction  of  segmaits.  If  a  is 
inaeased  then  the  contributions  from  more  segmaits  will 
be  discarded. 

It  is  important  to  stress  that  the  list  of  segments  for  which 
bispectral  values  are  discarded  can  be  different  at  different 
bifrequ^des.  This  is  intended  to  accommodate  interferoice 
such  as  bandlimited  transients,  which  will  affect  the  raw 
bispectral  estimates  at  some  bifrequendes  only. 

Furthermore,  the  dioice  to  apply  the  a-trimming  al¬ 
gorithm  separately  to  the  real  and  imaginary  parts  of  the 
bispectral  estimate  means  that  at  any  givai  bifrequency,  the 
segments  from  whidi  the  real  part  of  die  bispectral  estimate 
is  discarded  may  be  different  from  the  segments  from  whidi 
the  imaginary  part  is  discarded.  The  tadt  assumption  here  is 
that  the  real  and  imaginary  parts  of  the  segmental  bispectral 
estimates  are  indep^dent  of  each  other.  Although 

this  is  asymptotically  true  (because  the  estimator  is  asymp¬ 
totically  complex  normal[3]),  it  is  not  at  this  time  clear  how 
valid  this  assumption  is  in  practical  situations. 

2.1  Normalisation 

Since  the  contributions  to  the  bispectrum  estimate  from 
the  tails  of  the  sampling  distribution  are  ©ccluded  by  the 
a— trimming  technique  described  above,  the  denominator 
of  the  normalisation  in  Eqn.  2  also  needs  to  be  changed.  In 
order  to  try  to  presave  the  magnitude  of  the  bicoherence, 
the  following,  slightly  ad-hoc  approach  is  takoi  at  each 
bifrequmcy.  Since  both  real  and  imaginary  parts  are  treated 
in  the  same  way,  only  the  real  part  will  be  considered  here. 

•  Form  the  vector  r,  which  lists  the  s^jnent  numbers 

assodated  with  ordered  according  to  the  size 

of  ,  as  described  above. 

•  The  segments  listed  at  the  top  and  bottom  of  r 
(i.e.  ri,  ,,rg,  VK-g-^i,  ..J'k)  ar®  excluded  from  the 
a-trimmed  estimate  Ba  (Eqn.  5). 

•  Halve  the  contributions  of  these  outlying  segments 
(ri,  ,.rgjrK^g+iy --rK)  to  the  estimates  on  die  de¬ 
nominator  of  Eqn.  3. 

In  this  last  step,  the  reason  for  halving,  ratha*  than  exclud¬ 
ing  altogether,  contributions  to  denominator  estimates  is 
explained  as  follows.  Consida  one  segment  m  in  which 
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the  raw  estimates  of  the  numerator  and  denominator  of  the 
bicoherence  are  Bm{k,  /),  Sm{k,  1)  and  Pm{k  +  /).  Now 
the  contribution  of  segmoit  m  to  the  final  bispectral  estim¬ 
ate  will  be  zero  only  //both  ^Bm(k,  /)]  and  /)] 

are  extreme  values  (i.e.  so  both  the  real  and  the  imaginary 
parts  are  trimmed).  In  sudi  circumstances  it  is  desirable  that 
the  contributions  to  S{k,  1)  and  P{k  +  /)  are  also  zero.  If, 
on  the  other  hand,  /)]  is  trimmed  but  Q[Bm{k,  /)] 

is  not,  then  the  segment  m  does  contribute  something  to 
the  numerator  estimate  Ba ,  and  so  it  should  also  contribute 
something  to  the  denominator  estimates  Pq  and  Sa- 
Although  this  method  seems  to  work  well,  it  is  not  satis¬ 
factory  from  a  mathematical  perspective,  and  finding  a  better 
way  of  doing  this  is  a  topic  of  current  researdi. 

2.2  An  efficient  implementation 


The  algorithm  presoited  above  can  be  very  memory  con¬ 
suming,  since  the  raw  bispectral  estimate  for  every  s^ment 
has  to  be  stored  before  the  order  statistics  can  be  computed. 
Since  the  trimming  factor  a  is  typically  small  (0  <  a  <  0.2) 
a  more  efiSdent  algorithm  can  be  constructed  by  rewriting 
the  estimation  equation;  instead  of  a  summation  ova*  the  s^- 
ments  which  are  not  in  the  distribution  tails  (as  in  Eqn.  5), 
rewrite  this  as  a  summation  over  all  s^ments  followed  by 
a  subtraction  of  the  tail  values.  Using  this  implementa¬ 
tion,  only  the  2^  -f  2  (real  and  imaginary)  tail  values  need 
to  be  stored,  resulting  in  a  large  saving  in  memory  needs. 
However,  the  new  implemoitation  requires  a  local  sort^  on 
each  s^moit  in  turn. 


The  form  for  this  algorithm  becomes  evident  by  writing 


K  ^+1 

J2^rn='£^rn  + 


m=zl  m=l 


K-g-1  K 

Xm , 

m=^4*2  m^K~~g 


(6) 


and  subsituting  for  the  summation  Ylm=l+2  “  5- 

The  estimate  is  thai  writtai  as 

6  _  1 

—  r[3i[Br,+,  +  + 

m=l 

K 

-  E  + 

Four  vectors  1[(^^)  and  eadi  stoe 

</  +  1  extreme  values  of  the  real  and  imaginary  parts 
of  the  raw  estimates  as  m  =  Jbr  exanple, 

'  A  simple  bubble  sort  was  used  in  the  current  work. 


,  stores  the  left-hand  extreme 
values  (i.e.  values  in  die  1^  hand  tail  of  the  distribution 
of  /)])•  Whai  Bm{k, /)  is  calculated  for  a  new 

s^ment  m.  3J[Sm(A:,/)]  is  placed  as  the  (</  +  2)th  ele- 
mait  of  R(^^).  a  sort  is  carried  out.  so  that  = 

™nm=i,..,j+2 R^^^^  and  Rf^^^  =  max.A=i,..,g+2 
Ihe  value  in  R^J^^  is  discarded.  When  all  K  segments 
have  been  processed  in  this  way,  these  four  vectors  , 

and  will  contain  the  quantities  needed 
to  calulated  the  trimmed  mean  estitnatp.  from  Fgn  7. 

Since  a  is  typically  about  0.05  this  rqiresaits  a  storage 
saving  of  roughly  80%^  ova:  the  standard  method  of  com¬ 
puting  the  trimmed  mean. 

3  Results 

We  propose  that  the  modified  bicoheroice.  described 
above,  be  used  as  a  detect  of  Quadratic  Phase  Coup¬ 
ling  (QPC)  in  signal  processing  aivironmaits  inflnenred 
by  background  noise  and  transioits. 

In  orda  to  see  how  well  this  measure  works  in  practice, 
sevo-al  simulation  signals  have  beoi  analysed.  In  common 
with  other  simulations  used  to  measure  the  performance 
of  the  bicohaence  as  a  QPC  detector,  the  signal  [m(n)  = 
x{n)  -I-  i;(n)]  is  modelled  as  the  summation  of  an  undalying 
sinusoidal  conq)onent  [x(n)]  and  an  additive  disturbance 
[v(n)].  x(n)  exhibits  QPC,  but  this  may  be  difficult  to 
detect  with  the  ordinary  bicohaence  because  of  extraneous 
noise. 

The  signal  of  length  N  is  goiaated  s^ment  by  s^ment 
as  follows : 


3 

*(”)  =  E  «)s(2’r/j  n  +  4>j),  (8) 

j=i 

with  f3  =  fi  +  f2  and  <^3  =  +  <f)2.  Ibe  phases  <^i,  <^2 

are  re-randomised  U[0, 2ir)  in  each  frame  (this  satisfies  the 
Phase  Randomisation  Assumption  which  rendas  the  bico¬ 
haence  magnitude  suitable  for  QPC  detection[2]). 

hi  previous  applications  of  the  bicohaence  [4,  1]  v{n) 
was  whiteGaussian  noise,  hi  this  papa,  v(n)  =  t(n)-|-^(n) 
is  a  summation  of  randomly  occurring  short-lived  transients 
(modelled  by  damped  sinusoids)  and  white  Gaussian  noise; 

N, 

"(”)  =  E  cos(2;r/ijn  +  Oj  -f-  g{n),  (9) 

i=i 

in  which  hr  =  U[0, 0.5),  Oj  =  U[0, 2ir]  ,t)  =  U[0, 1)  and 
g{n)  is  white  Gaussian  noise.  The  transients  are  triggaed 
randomly,  with  the  probability  of  a  transient  beginning  at 
any  one  tune  sample  is  controlled  by  a  paramto:  7. 

^based  on  an  assumption  that  iC  =  64,  a  =  0.05, 5  =  3,  the  saving  is 
1  -  (25  +  2)/K  =  87%. 
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Figure  1-  Left:  Time  series  ofm{n).  Right:  Power 
spectra  of  m{n)  (signal  plus  noise)  and  x(n)  (clean 
signal  no  noise). 


a 

6i(0.2.0.1) 

/?(%) 

notes 

0 

0.82 

1.7 

standard  method 

0.05 

0.82 

3.2 

m  2  s^m^ts  discarded 

0.20 

0.84 

7.3 

m  6  s^meats  discarded 

Table  1.  Performance  measure  of  new  technique. 


Rg.  1  shows  the  time  series  of  one  example  of  the 
noisy  signal  m{n)  (N  =  1024,  7  =  0.02,  SNR= 
101ogio<T^/^S  =  OdB),  with  h  =  0.1,  /2  =  0.2,  together 
with  the  Power  Spectrum  {Ndft  =  64,  K  =  16)  for  both 
the  signal  with  no  noise  [ip(n)],  and  the  signal  with  the  tran¬ 
sient  and  steady  state  noise  added  [m(n)  =  x(n)  +  v(n)]. 
Clearly  the  noise  has  a  very  detrim^tal  effect  on  the  powa: 
spectrum,  almost  obscuring  the  spectral  peaks.  Hg.  2  shows 
the  squared  bicoherence  of  m(n)  with  different  levels  of 
a-trimming.  The  bicoh^oice  should  peak  at  (0.2, 0.1) 
(which  is  equivalent  to  (0. 1 , 0.2)  because  of  symmetry).  The 
top  plot  a  =  0  corresponds  to  the  ordinary  squared  bicoher¬ 
ence  estimate  -  the  peak  at  (0.2, 0. 1)  is  barely  visible  above 
the  noise  floor.  However,  the  a-trimmed  estimates  show 
mudi  lower  noise  floors. 

The  improvement  in  p^ormance  can  be  measured  by 
P  =  6^(0.2,0.1)/X^jy  6^(ib,/)  X  100%  -  the  pa-centage 
of  total  bicoherence  “energy”  which  occurs  in  the  correct 
bin^.  Bette  QPC  detectors  will  have  higher  values  of  /?. 
Table  1  shows  how  this  varies  for  a  typical  example  of 
this  simulation.  It  is  dear  that  the  a— trimmed  estimates 
perform  better  as  QPC  detectors  than  the  ordinary  squared 
bicoherence.  Further  simulation  results  will  be  shown  at  the 
confoence. 


4  Discussion  and  Conclusions 

The  proposed  QPC  detector  based  on  an  a— trimmed 
bispectral  estimate  appears  to  give  reduced  noise  floors  in 
the  simulations  investigated  so  far,  and  peaks  due  to  QPC  are 
easier  to  pick  out  using  this  detector  than  using  the  standard 
bicoherence.  In  particular  the  new  detector  is  robust  to 
int^erence  from  additive  transients.  The  normalisation 
scheme  used  in  this  paper  appears  to  work  successfully, 

^  YIit  ^  summation  over  the  Inner  Triangle[3]. 


2»  Squared  bicoherences  of  m(n).  Top:  Or¬ 
dinary  (standard  approach).  Middle:  a  =  0.05.  Bot¬ 
tom:  Of  =  0.20. 

although  it  does  not  have  a  rigorous  mathematical  basis. 

The  paf  ormance  of  the  new  estimator  as  a  QPC  detector  for 

other  types  of  interference  (such  as  Amplitude  Modulation) 

is  a  topic  of  current  work,  which  will  also  be  described  at 

the  conference. 
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Abstract 

The  work  is  addressed  to  provide  realistic  modelling  of 
generic  noise  probability  density  functions  (pdfs),  in  order 
to  optimize  signal  detection  in  non-Gaussian 
environments.  The  target  is  to  obtain  a  model  depending 
on  few  parameters  (quick  and  easy  to  estimate),  and  so 
general  to  be  able  to  describe  many  kinds  of  noise  (e.g, 
symmetric  or  asymmetric,  with  variable  sharpness).  To 
this  end,  a  new  HOS-based  model  is  introduced,  which 
derives  from  the  generalized  Gaussian  function  and 
depends  on  three  parameters:  kurtosis  (fourth  order),  for 
representing  variable  sharpness,  and  Igft  and  right 
variances  (whose  combination  provides  the  same 
information  of  skewness  -  third  order)  for  describing 
deviation  from  symmetry.  The  model  is  applied  in  the 
design  of  a  LOD  test  for  detecting  signals  corrupted  by 
real  underwater  acoustic  noise  in  a  low-frequency  range. 

1.  Introduction 

Realistic  and  simple  statistical  modelling  of  generic 
background  noise  is  addressed  in  order  to  optimize  signal 
detection  in  non-Gaussian  environments.  Detection 
purpose  is  to  decide  between  the  two  hypotheses  of  the 
presence  (H|)  or  the  absence  (Hq)  of  a  transmitted 
deterministic  signal  {sj^  k=l,  K}  (the  approach  can  be 
extended  to  the  stochastic  case),  on  the  basis  of  acquired 
observations  {yj^  k=l,  K}  (application  of  binary 
hypothesis  testing  [1];  the  noise,  {nj^  k=I,  K) 
corrupting  the  signal  during  the  propagation  is  assumed 
additive,  independent  and  identically  distributed, 
stationary,  and  generally  non-Gaussian  and  unimodaL 

The  work  main  target  is  to  design  a  detector 
characterized  by:  (a)  high  performances  in  the  case  of 
weak  signals;  (b)  easy  applicability  to  real  cases  (in 
particular,  easy  and  realistic  estimation  of  needed 
parameters,  realistic  noise  modelling,  and  robustness  to 
variable  boundary  conditions);  (c)  algorithmical 
simplicity. 

Detection  optimization  in  the  case  of  low/middle  values 
of  the  Signal-to-Noise  Ratio  (SNR)  (in  the  range  [-30,0] 


dB)  (property  (a)\  is  reached  by  selecting  the  Locally 
Optimum  Detector  (LOD)  [1]  as  statistical  inference 
approach. 

For  satisfying  conditions  (b)  and  (c),  the  investigation 
is  addressed  to  express  generalized  noise  pdf  models, 
usually  depending  on  parameters  difficult  to  be  estimated 
from  real  data  samples,  in  terms  of  Higher-Order-Statistics 
(HOS)  parameters  [2],  which  are  very  easy  and  quick  to  be 
extracted  from  data  and  are  particularly  suitable  for 
quantifying  deviation  from  Gaussianity  in  terms  of 
asymmetry  (with  third-order  parameters)  and  variable 
sharpness  (with  fourth-order  parameters). 

As  conventional  signal  processing  algorithms  based  on 
the  Second  Order  Statistics,  optimized  in  presence  of 
Gaussian  noise,  may  decay  in  non-Gaussian  noise,  various 
works  used  HOS  theory  [2]  as  signal-processing  basis  for 
noise  analysis  and  detection  optimization;  however,  some 
methods  work  only  with  non-Gaussian  signals  [3][4][5]  or 
only  in  Gaussian  noise  [5][6][7];  some  can  be  applied  only 
under  certain  assumptions  of  noise  distributions  [8] [9]; 
some  are  not  optimized  in  the  case  of  weak  signals  [3]; 
finally  some  algorithms  are  complicated  [8]. 

In  order  to  overcome  at  least  some  of  the  aforesaid 
limitations  and  improve  robustness,  simplicity  and 
generality  of  HOS-based  detectors,  the  parametric 
asymmetric  generalized  Gaussian  pdf  model  is  introduced. 
It  derives  from  the  combination  of  the  well-known 
generalized  Gaussian  pdf  [10]  and  of  the  asymmetric 
Gaussian  model  presented  in  [1 1]. 

The  first  model  is  symmetric  and  depends  on  a  real 
parameter,  c,  which  is  not  easy  to  estimate  from  data. 
Nevertheless,  c  presents  a  physical  meaning,  as  linked 
with  the  pdf  sharpness.  The  HOS  parameter  which  better 
describes  sharpness  variability  is  the  fourth-order  kurtosis, 
P2.  Hence  the  analytical  relationship  between  c  and  P2 
introduced  (see  [12]  for  details).  The  resulting  symmetric 
function  based  on  kurtosis  has  the  same  characteristics  of 
the  generalized  Gaussian,  and  is  a  realistic  noise-pdf 
model  for  1.865<P2^30. 

In  order  to  introduce  into  this  variable-sharpness  model 
also  possible  deviation  from  symmetry,  the  resulting 
kurtosis-based  function  is  modified  by  taking  into  account 
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the  asymmetric  Gaussian  model  [11].  It  directly  derives 
from  the  Gaussian  shape,  but  is  asymmetric  and  depends 
on  two  second-order  parameters,  the  left  and  right 
variances  [1 1].  By  introducing  these  two  parameters  in  the 
kurtosis-based  generalized  Gaussian  ftmction,  the 
"asymmetric  generalized  Gaussian"  model  can  be 
obtained. 

The  new  model  is  compared  with  the  generalized 
Gaussian  and  the  asymmetric  Gaussian  pdfs,  which  result 
as  its  particular  cases.  It  is  applied  in  the  design  of  a  LOD 
test,  used  for  detecting  deterministic  signals  corrupted  by 
real  underwater  acoustic  noise  radiated  by  ship  traffic 
[13]. 


2.  The  asymmetric  generalized  Gaussian  pdf 


In  the  context  of  noise  modelling,  one  of  the  most 
noticeable  ways  in  which  estimated  noise  distributions 
deviate  from  Gaussianity  is  in  kurtosis  P2.  >  the  ratio  of 
the  fourth  and  the  square  of  the  second  central  moments.  It 
is  equal  to  3  in  the  Gaussian  case;  the  sharpness  of  the  pdf 
shape  is  higher  (lower)  than  the  corresponding  Gaussian 
function  as  P2  ‘s  larger  (smaller)  than  3.  A  good  model  for 
general  pdfs  has  variable  sharpness. 

One  of  the  well-known  symmetric  pdf  models  is  the 
generalized  Gaussian,  which  depends  on  the  parameter  c. 


PgG(«)  = 


yc  ^-|Y(«-P)r 

2r(l/c) 


(1) 


where  {n}  is  generic  noise  with  mean  value  p  and  variance 


-  +00 

r(it)= 

a¥(l/c)’  0 

c  cannot  be  directly  estimated  from  data  samples;  hence 
the  relationship  between  c  and  P2  was  found  [12].  It 
derives  from  the  P2  definition  and  is  expressed  by  the 
following  formula: 

_  E{(n-ii/}  T(5/c)Y(l/c)  ^ 

}]^  (T(3/c)) 


c  =  c(M^^^^,-0-l2forU65<m30. 

This  formula  allows  one  to  express  in  terms  of 

p2  [12].  Its  validity  is  confirmed  by  observing  that  for 
p2^3  the  resulting  pdf  has  heavy  tails y  as  expected  [10]. 

In  order  to  generalize  this  model  so  that  it  can  be  also 
asymmetric,  the  asymmetric  Gaussian  model  presented  in 
[1 1]  is  taken  into  account.  It  depends  on  two  second-order 

-  2 

parameters  (deriving  from  the  definition  of  variance),  C/ 


and  called  respectively  "left”  and  "right  variances" 
and  estimated  from  finite  sequences  of  the  process  {n} 
according  to  the  following  formulas: 

/  X7_  A 


0]  = 


csl  = 


1 


N,-\ 

1 


N,-\ 


Ni 

Z 

*=l,n*<U 

Z 

yk=l,nii>n 


and 


(3) 


where  A/  (A^)  'S  the  number  of  n,^  samples  <p  (>p).  The 
model  expression  follows: 


PaG(^)  = 


'sI2ti  (ct/  + 


2a? 


2a? 


■v/27t(CT/  +C^) 


«  <  P 


«  >  P 


(4) 


As  well  as  the  kurtosis-based  generalized  Gaussian 
model,  it  is  analytically  simple  and  easy  to  be  estimated  if 
some  data  sequences  are  available  (the  model  includes  the 


Gaussian  case  for  0/  =  0?).  The  left  and  right  variances 
are  linked  with  the  variance  (the  well-known  second-order 
parameter)  and  with  the  skewness  (the  third-order 
parameter  describing  and  quantifying  pdf  asymmetry)  as 
follows  [11]: 


7  2  2 

(where  E  is  the  expectation  value). 

In  a  similar  way,  these  two  parameters  are  introduced 
in  the  kurtosis-based  generalized  Gaussian  pdf  in  order  to 
transform  it  into  the  following  asymmetric  generalized 
Gaussian  model: 


PagGi>^) 


r(l/c) 

ir(i/c) 


(6) 


where 


Ya 


1 

Oi  +  Cr 


r(3/c)] 

r(i/c), 


Y/ 


r(3/c) 

r(i/c) 


_  1  rr(3/c)l 

0Ar(i/c)J 
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2  2 

It  is  easy  to  notice  that  if  CT/  =a^ ,  then  the  pdf 
coincides  with  the  generalized  Gaussian,  hence  it  is 

2  2 

symmetric;  if  G/  =a^  and  coincides  with 

the  Gaussian  model.  Figure  1  presents  a  family  of  the  pdf 
as  P2  varies. 


Asymmetric  Generalized  Gaussian  pdf  family  (varl=1 .  varr=2) 


Fig.  1.  Asymmetric  generalized  Gaussian  family 

(p  =  0). 

3.  The  LOD  test  designed  on  the  basis  of  the 
new  model 


The  model  is  suitable  for  the  design  of  a  LOD  test  [1], 
as  the  non-linearity  gioC)  and  the  maximum  asymptotic 
relative  efficiency  p  can  be  expressed  in  terms  of 
elementary  functions.  In  particular: 


gLoiy)  = 


1 


y<\i 

y>\i 


(7) 


Y^c^r(2-l/c)(Y/+y^)  2  2  __  X 

^ - f(I77) - ^  ‘ 


The  respective  graphs  are  presented  in  Figs.  2  and  3. 
From  their  analysis  it  is  easy  to  conclude  that  the  test 
works  better  for  values  of  the  kurtosis  larger  than  3  (i.e., 
for  super-Gaussian  and,  in  particular,  for  impulsive  noise 
pdfs),  as  expected;  nevertheless  it  can  reach  good 
performances  even  in  more  critical  conditions  of  sub- 
Gaussian  noise. 

In  order  to  deduce  test  performances  a  theoretical 
point  of  view,  these  graphs  can  be  compared  with  similar 
graphs  for  other  well-known  pdfs:  for  example,  LOD  non 
linearities  and  maximum  ARE  curves  computed  in  terms 
of  the  c-based  generalized  Gaussian,  the  generalized 
Cauchy,  the  generalized  beta  functions  are  presented  in 
[10],  those  expressed  in  terms  of  the  asymmetric  Gaussian 


and  the  kurtosis-based  generalized  Gaussian  pdfs  are 
shown  in  [12][14]. 


LO  non-linearities  for  Asymmetric  Generalized  Gaussian  noise 


Fig.  2.  graphs  as  P2  varies  -a/  =3,  =  1 . 


4.  Experimental  results  on  real  data 

From  an  experimental  point  of  view,  the  capability  of 
the  proposed  model  of  describing  realistically  generalized 
noise  pdfs  was  evaluated  by  applying  it  to  the  problem  of 
detecting  known  constant  signals  corrupted  by  underwater 
acoustic  ship-traffic-radiated  noise  [13].  The  noise  data 
sequences  were  analysed  and  characterized  at  average  by 
the  estimated  parameters  p=-17.5,  p2^2.51,  a/=  1550  and 
a^=1350  [13].  A  comparison  between  noise  histogram 
(computed  on  10  records  of  10000  samples)  and  the  new 
pdf  model  estimated  on  the  basis  of  the  aforesaid  second- 
and  fourth-order  parameters  can  be  deduced  from  Figs.  4 
and  5:  a  good  fitness  between  data  and  model  is  shown. 
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Rea!  Data  Histogram  on  3e+004  samples 


Fig.  4.  The  non-normalized  data  histogram. 


Fig.  6.  Comparison  of  test  results.  /V:^  =  0.05. 


The  detection  performances  obtained  by  applying  the 
HOS-based  model  to  a  LOD  test  are  presented  by  means 
of  experimental  curves  of  the  Detection  Probability  Pq  as 
SNR  varies,  given  a  certain  value  of  the  Probability  of 
False  Alarm,  Pp^-  In  *e  diagram  of  Fig.  6  Ae 
performances  obtained  by  using  the  new  pdf  model 


(depending  on  the  left  and  right  variances  and  on  the 
kurtosis)  are  compared  with  those  provided  by  an 
asymmetric  Gaussian  model  (depending  only  on  the  left 
and  right  variances). 

The  performance  improvement  is  mainly  associated  to  the 
capability  of  the  new  pdf  to  model  non-Gaussian  noise  in 
a  more  realistic  way,  so  that  it  can  be  better  filtered. 
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Abstract 

This  paper  considers  the  use  of  composite  property  map¬ 
pings  for  MA  cumulant  matching.  The  algorithm  makes  use 
of  two  property  mappings  corresponding  to  rank  and  struc¬ 
ture  properties  of  a  matrix  consisting  ofMA  cumulants.  It 
is  proved  that  these  two  properties  are  sufficient  to  charac¬ 
terise  a  matrix  consisting  of  true  MA  cumulants.  This  result 
clearly  implies  that  provided  that  convergence  is  achieved, 
the  composite  property  mapping  algorithm  performs  some 
kind  of  cumulant  matching.  The  issue  of  convergence  is  also 
discussed  in  the  paper.  Numerical  results  are  presented  to 
show  the  performance  of  the  algorithm.  Keywords:  Higher 
Order  Statistics,  System  Identification. 
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Figure  T.The  cumulant  matrix  used  in  the  CPMA  al¬ 
gorithm.  It  is  assumed  thatcn^r^  =  C3,i:(ri,  r2). 


Consido’  the  following  finite  impulse  response  (FIR)  sig¬ 
nal  model:  x{t)  =  h{i)w{t  —  i)  wh^e  the  system 

input  is  assumed  to  be  non-Gaussian,  independent  identi¬ 
cally  distributed  (HD) ,  random  process  with  E{w(t)}  =  0, 
E{w(t)w(t  -I-  n)}  =  /?25(n).  and  E{w(t)w{t  -f-  ni)u;(t  -1- 
"2)}  =  T3'^(«i,«2).  We  assume  that /i(0), /i(g)  ^  0.  A 
method  for  the  enhancemait  of  third  order  cumulants  of  MA 
models  was  presented  in  [1].  That  method  is  based  on  the 
use  of  Composite  Propaty  Mapping  Algorithms  (CPMA) 
[3].  The  read®  is  refared  to  [3]  for  fiutha  information 
on  CPMA  and  to  [5]  for  a  genaal  introduction  to  sa  the- 
OTetic  estimation.  Composite  property  mapping  algorithms 
have  originally  bear  used  within  a  HOS  fi-amework  in  [4] 
for  blind  array  processing.  This  is  a  follow-up  to  the  work 
described  in  [1].  Some  of  the  material  presented  hae  is  also 
included  in  [2]. 

2  Cumulant  Enhancement 

In  the  following  we  summarise  the  main  stq)s  involved 
in  the  method  of  [1]: 


1.  Collat  the  sample  cumulants  corresponding  to  the 
minimal  sufficient  set  of  lags  in  the  following  vecta: 

00  =  [C3,x(0,  0),  C3,i:(0,  1),  C3,^(l,  1),  .  .  . , 

C3,x(0,  g), . . .  ,  C3^^{q  -  1,  q),  C3^^{q,  5)]''’.  (1) 

2.  The  elemaits  of  ^0  are  then  used  to  build  the  matrix  Cq 
depicted  in  figure  1  Co  is  a  (2g  -|- 1)  x  (2g  + 1)  matrix. 
The  matrix  Co  contains  all  the  third  orda  cximulants  in 
thevectOT^o-  It  can  be  shown  that  a  matrix  C  whiciire- 
sults  firom  matrix  Co  afta  r^lacing  sample  cmmulants 
with  true  MA  cumulants,  possesses  the  following  two 
theoretical  properties: 

(a)  rank(C)  =  g  +  1 

(b)  The  structural  composition  of  C  is  determined  by 
a  chaacteristic  matrix  A  and  we  say  that  C  is  a 
linear  structured  matrix.  (More  details  in  [1]) 

3.  We  pafonn  the  following  itaation:  C^+i  = 

=  /'A(:Fj+i(Cfc)).untill-r^+i(Ct+i)  <  e. 

where  e  is  a  predefined  small  positive  number.  The 
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mappings  .•^«+i  and  the  quantity  rj4.i(Cife+i)  are 
de^ed  as  follows: 

(a)  The  mapping  corresponds  to  the 

rank  propaty  of  C.  It  is  implemented 
using  SVD  (SVD-based  rank  reduction) 
^?+i(X)  =  whae 

X  £  r(2«+i)x(2«+i)  andX  =  <rjfcUjfevJ. 

(b)  The  mapping  :Fa  corresponds  to  the 

linear-structure  property  of  C:  /a(X)  = 

T-^{A[A^A]-^A'^  T(X)). 

(c)  It  is  possible  to  get  an  idea  of  how  close  a  matrix 
X  is  to  a  g  -f  1-rank  matrix,  by  examining  how 
close  the  following  quantity  is  to  1 ,  provided  that 

<7-5+1  >  <7-5+2: 


(2) 

Z^ifc=i 

3  CPMA  for  Cumulant  Matching 


In  [1]  this  composite  property  mapping  method  was  used 
as  a  preprocessing  step  before  applying  some  linear  methods 
for  system  identification.  A  question  that  rises  naturally,  is 
whether  a  matrix  that  possesses  both  properties  defined  in 
the  previous  section,  contains  true  cumulants  of  some  MA 
model. 

To  start  with  we  assume  that  the  matrix  sequence  gener¬ 
ated  by  the  iterative  algorithm  described  earlier  converges 
to  a  matrix  S ,  which  has  both  the  desired  structure  and  rank 
properties.  It  is  interesting  to  examine  whether  this  matrix 
consists  of  real  cumulants  of  some  MA  model.  Since  the 
matrix  S  has  the  same  structural  charactaistics  as  those  of 
a  matrix  constructed  of  real  cumulants ,  thsa  if  Si  j  is  a  non¬ 
zero  element  of  S.  we  denote  Sij  as  s(ri ,  T2),  where  (ri ,  T2) 
are  the  lags  we  associate  with  the  2,  j-element  of  a  struc¬ 
turally  equivaloit  matrix  which  is  constructed  from  real  cu¬ 
mulants.  Then  because  of  the  structure  property,  the  same 
symmetries  that  apply  to  lags  of  cumulants  will  apply  to 
these  associated  lags  of  s(ri ,  r2).  In  the  following  it  is  as¬ 
sumed  that  s(0,  g),  s(g,  g)  ^  0.  The  following  Lemma  is 
required: 


Lemma  1  Suppose  that  we  are  given  a  (2g  1)  x  (2g  +  1) 

matrix  S ,  which  has  the  two  prescribed  properties  ( structure 
and  rank).  Then  the  following  equation  holds  for  s(ri ,  r2); 


s(2,g)s(2  -f  n,g) 
5(0,g)s(g,g) 


forn  =  0,  ..g—  1  andj  =  n  -  g,  ...,g. 


(3) 


The  proof  of  this  Lemma  is  given  in  the  Appendix. 
Sinceweknowthats(0,g),s(g,g)  ^  0  wecanfinda73  ^  0 
sudi  that 


x_  g(0,g)  s{q,q)  s{i,q) 


(4) 


If  we  combine  equations  (3)  and  (4),  we  obtain  the  follow¬ 
ing: 


f’  \  g(0) g) ^{'1 H- jj g) 

s{0,q)s{q,q)  s(0,  g)  s(0,  g)  s(0,g) 

^  s(i,  g)  s{i  -F  n,  q)  s{i  +  j,  q) 
~^^^s(0,g)  s(0,g)  s(0,g) 

Equation  (5)  shows  that  s(  j,  n)  is  the  third  order  cumulant  of 
an  MA  model  with  parameters  h{i)  —  s{i,  g)/s(0,  g).  Thus 
the  following  theorem  holds: 

Theorem  1  Every  (2g  4-  1)  x  (2g  -h  1)  matrix  S  possess¬ 
ing  the  structure  and  rank  properties  defined  in  this  section, 
consists  of  real  cumulants  of  some  MA(  q)  model. 

The  above  theorem  implies,  that  the  cumulant  enhancement 
method  summarised  earlier,  when  it  converges  to  a  matrix 
with  the  prescribed  properties,  performs  some  kind  of  cumu¬ 
lant  matching. 

4  Convergence  Properties 

An  important  issue  that  needs  to  be  addressed  here  is  that 
of  convergence.  Let  us  consider  a  matrix  sequoice  gener¬ 
ated  according  to  the  algorithmic  rule, 

Cjfc  for  k>l  (6) 

in  which  the  initial  matrix  Co  is  the  experimentally  g«ier- 
ated  matrix  Cq.  Then  at  every  step,  the  matrix  C^;  has  the 
right  structure  and  is  “nearer”  to  a  matrix  with  rank  g  +  1 
thanCjfe-i.  In  iterative  mappings  of  this  type,  convergence 
is  guaranteed  only  when  all  property  sets  are  convex  [5]. 
However,  in  our  case,  it  is  obvious  that  the  set  of  matrices 
with  rank  g  +  1  is  not  convex,  and  this  violates  the  assump¬ 
tions  required  for  Theorem  (1)  in  [3].  In  [7],  Dologlou  et  al 
provide  an  interesting  theorem  which  shows  that  the  norm 
of  the  difference  Cj^  —  and  the  distance  betweoi  C*; 
and  the  set  of  matrices  with  rank  less  than  or  equal  to  g  +  1 
both  converge  to  0  when  ^  00.  The  behavior  predicted  by 
this  theorem  is  verified  by  our  numerical  simulations  (sum- 
jtnarised  in  figs.2  and  3).  For  system  identification  the  fol¬ 
lowing  methods  are  used:  The  Least  Squares  method  of  [2] 
(LS),  the  Closed  Formula  (CF)  of  [6]  and  a  nonlinear  method 
for  cumulant  matching  [8].  Ithas  been  observed  though,  that 
for  some  MA  models  convergence  in  the  sense  described  in 
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Figure  2.  The  singular  value  ratio  (2),  the  square  er¬ 
ror  of  the  enhanced  cumulants  and  the  square  error  of 
the  estimated  parameters  as  a  function  of  number  of 
iterations.  The  dark  curve  represents  the  CF  method 
[6]  and  the  light  curve  represents  the  LS  method  [2 ]. 


Figure  3.  System  identification  results  after  100 
Monte  Carlo  runs.  Length  of  output  sequence:  3000 
samples.  The  horizontal  lines  represent  the  true  val¬ 
ues  of  the  system  parameters  (model  order-3).  The 
vertical  bars  represent  the  average  estimate  +/-  stan¬ 
dard  deviation.  The  last  graph  shows  the  MSB  of  the 
estimation. 


[7]  can  some  times  require  thousands  of  itoations.  Conse¬ 
quently  there  are  cases  where  practically  we  cannot  achieve 
convergaice.  In  these  cases,  the  algorithm  can  still  be  used 
for  prqjrocessing  the  cumulants  as  proposed  in  [1],  Since 
convergence  can  sometimes  be  diffictdt  to  achieve,  com¬ 
positemapping  algorithms  cannot  replace  ^sting nonlinear 
methods  for  cumulant  matching.  The  drawback  of  nonlinear 
cumulant  matching  methods  is  that  there  is  always  a  danger 
of  becoming  trapped  in  a  local  minimum.  However,  in  order 
to  avoid  this  occurrence,  good  initial  conditions  are  required 
and  these  are  usually  provided  by  linear  methods.  Because 
cumulant  matching  does  not  require  any  initial  conditions, 
it  can  be  applied^  prior  to  linear  methods  in  order  to  provide 
improved  initial  conditions  for  the  nonlinear  methods. 

References 

[1]  A  G  Stogioglou  ,  S  McLaughlin.  Third  Order  Cumulant  En¬ 
hancement  for  MA  Models.  In  Proceedings  of  IEEE  Workshop 
on  HOS,  June  1995. 

[2]  A  G  Stogioglou,  S  McLaughlin.  MA  Cumulant  Enhancement 
and  Parameter  Estimation.  IEEE  Transaction  on  Signal  Pro¬ 
cessing,  July  1996. 

[3]  J.  Cadzo  w.  Signal  Enhancement-  A  Composite  Property  Map¬ 
ping  Algorithm .  IEEE  Transactions  on  Acoustics,  Speech  and 
Signal  Preocessing,  36(l):49'-62, 1988. 

^  without  necessarily  achieving  convergence 


[4]  J.-F.  Cardoso.  Fourth-Order  Cumulant  Structure  Forcing.  Ap¬ 
plication  to  Blind  Array  Processing.  Li  Proceedings  IEEE  SP 
Workshop  on  SSAP-92, 1992. 

[5]  P.  L.  Combettes.  The  Foundations  of  set  Theoretic  Estimation . 
Proceedings  of  the  IEEE,  81(2):181208.  February  1993. 

[6]  G.  Giannakis.  Cumulants:  A  powerful  Tool  in  Signal  Process¬ 
ing.  Proceedings  of  the  IEEE,  75(9):1333-1334,  September 
1987. 

[7]  I  Dologjou,  J-C  Pesquet  and  J  Skowronski.  Projection  Based 
Rank  Reduction  Algorithms  for  Multichannel  Modelling  and 
Image  Compression.  Subitted  to  Signal  Processing,  1995. 

[8]  K  Lii  and  M.  Rosenblatt.  Deconvolution  and  estimation  of 
transfer  function  phase  and  coefficients  for  non-Gaussian  lin¬ 
ear  processes.  Ann.  Statist,  10:1195-1208, 1982. 

APPENDIX 

Proof  of  Lemma  1 :  The  vectors  corresponding  to  the  first  q-\-l 
rows  of  the  matrix  S  are  denoted  as  where  d  =  0, ...,  g  and  the 
vectors  corresponding  to  the  last  q  rows  of  the  matrix  are  denoted 
bys^_i,  .,.,so.  If  we  assume  that  s(0,gr),s(g,^)  ^  0,  then  it  is 
evident  from  their  structure,  that  the  g  -h  1  vectors  Sg  ,d  =  0, q 
are  linearly  independent.  Given  that  the  rank  of  the  matrix  is  g + 1, 
we  can  conclude  that  the  vectors  corresponding  to  the  last  q  rows 
of  the  matrix,  belong  to  the  space  spanned  by  the  first  g  -f  1  rows. 
Li  particular  since  the  last  n  elements  of  the  vector  s„(7i  =  g-> 

1 , . . . ,  0)  are  zero,  it  can  easily  be  seen  that  they  belong  to  the  space 
spanned  only  by  for  =  o, ...,  g  -  n.  We  can  write  this  as 
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follows: 


Sn  6  span{s°,  n  =  ?-l,...,0. 


(7) 


Now  if  we  take  n  =  g  -  1,  it  is  straight  forward  to  prove  that , 


s^-i  = 


«(!.?)  „o  ,  ^(g  -  l.g)„i 


rSo  + 


8(0,  siq,q)  ’ 

In  scalar  form,  this  translates  to, 

1 

s{j,  g  1)  -  ^  g)a(0,  q) 

t=0 


(8) 


j  —  1 , . . . ,  g , 


(9) 


so  equation  (3)  holds  forn  =  g  —  1. 

Assumption  1:  Let  us  suppose  that  equation  (3)  holds  for  every 
71  such  that  q-l>n>  kjorsome  k  >  0. 

We  want  to  prove  that  it  also  holds  for  n  =  A;  —  1 . 


qr-fc+l 

Sfc-1  = 

»=0 


(10) 


Because  of  the  cumulant  like  symmetries  in  the  lags  of  5(A:  —  g-‘ 
1,  A: -1),  the  first  element  of  Sfc-i  is  s(A:  —  g  -  1,  A:  -  1)  =  s(g  — 
A;  +  1,  g).  It  is  related  with  5(0,  g)  as  follows: 


s(g  -  A;  -{- 1,  g)  =  Afc_i,os(0,  g). 


(11) 


From  equation  (11)  we  can  obtain  the  value  of  A^- 1,0  =  5(g  — 
A;  H-  1,  g)/s(0,  g).  Since  s(A;  -  g  1,  fc  ~  1)  =  {s{q  -  A;  -f 
1,  q)/s{0,  g))s(0,  g),  equation  (3)  holds  for  n  =  A;  -  1  and  j  = 
A:  —  g  —  1. 

Assumption  2;  Suppose  that  equation  (3)  holds  for  n  —  k—1 
and  k  —  q  --  1  <  j  <m  where  m  <  —2. 

In  other  words  we  assume  that  we  know  that 


s{m^  A:  —  1)  = 


s(i,  q)s(i  +  A;  —  l,g)  .  . 

5(0,g)s(g,  g) 

q-fc+i 

Xk-i,q-k^i-is{i-\-m,q).  (12) 


We  want  to  obtain  the  value  of  Afe_i,m-~fc+q+2  and  use  this  to 
show  that  equation  (3)  is  valid  for  n  =  A;  —  1  and  j  =  m  + 1 .  So, 


9— fc+i 

s(0,g)s(g,g) 

(Afc— i,m— fc+g+2  )s(0,  g)j 


(13) 


buts(m+l,  A;-!)  =  s(-m— 1,  A;— m-“2)  ,  whereA;-m-2  >  k. 
Then  according  to  Assumption  1  we  have, 

s{m  +  1,  A;  —  1)  =  s(— m  —  1,  A:  —  m  —  2) 


>=0 


5(0,g)s(g,g) 


(14) 


Consequently,  the  previous  equation  can  now  be  rewritten  as  fol¬ 
lows. 


s(m l^k  —  1)  — 


j-1 


s(0,g)s(g,g) 

s(0,  g)s(A;  —  m  —  2,  g)  ,  .  . 

5(0,g)5(g,  g) 

If  we  make  the  transformation  j  —  i  +  m  -j- 1  in  equation  (15)  we 
obtain , 

y  ^  s(i -f  m -h  1,  g)s(2  +  A;  —  1,  g)  ..  >. 

.(™  +  M-.)=  S  -! - 

t=—m 

5(0.  g)5(Aj  771  2j  g)  ,  -1  \  /1/c^ 

+  ^  w - (16) 

s(0,g)s(g,g) 

Now  observe  that  the  summations  in  equations  (16)  and  (13)  are 
equal ,  thus  we  can  deduce  that 

s{k  —  m  —  2,  g)s(— m  —  1,  g) 

=  s(0,g).(g,g)  ^ 

and  consequently  equation  (13)  can  be  rewritten  as 

s{m  +  Ij  A;  —  1)  = 


i=— m— 1 


Equation  (17)  demonstrates  that  equation  (3)  is  valid  for  i  =  m + 1 
and  n  =  A;  —  1 .  Now,  knowing  that  the  initial  equation  correspond¬ 
ing  to  n  =  A:  —  1  and  j  =  A:  —  g  ~  1  holds,  we  have  demon¬ 
strated  that  we  can  prove  equation  (3)  to  be  valid  for  n  =  A;  —  1 
and  A:  —  g  —  l<j<— 1.  From  expression  (7)  we  know  that, 


q—k-\-\ 

Sk—1  ~  ^  ^  Xk—l  ,i^q- 
i-0 


(18) 


We  have  already  obtained  the  values  of  Xk-i  ,0  to  Afc_i  ^q-k ,  but  we 
stiU  need  to  find  Afc_i,g_fc-{-i.  This  is  easily  obtained  if  we  consider 
the  following  expression  for  the  last  non -identically  zero  element 
ofsfe_i: 


s(g,  A;  -  1)  =  Xk-i,q-k+is{q,  g), 
5(g,A:  -  1) 


Xk-l,q~k‘\-l  = 


s{q,q) 


(19) 


(20) 


Since  we  know  all  the  A 's  in  (18),  we  can  now  write  (18)  in  scalar 
form: 

5(0,g)s(g,g) 

t=0 

where  j=k-l-q,...,q.  Given  that  equation  (3)  is  valid  forn  =  g  —  1 
we  have  shown  that  it  is  valid  for  every  n  such  that  g  —  1  >  n  >  0. 
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Abstract 

The  problem  of estimating  the  parameters  of  a  non  causal 
ARMA  system,  driven  by  an  unknown  input  noise  with  un¬ 
known  probability  density  function  (PDF)  is  addressed.  A 
madmum  likelihood  approach  is  proposed  in  this  paper.  The 
main  idea  of  our  approach  is  that  the  assumed  PDF  of  the 
input  noise  is  the  PDF  minimizing  the  Fisher  irtformation 
(FI)  among  PDFs  matching  the  estimated  cumulants  up  to 
4th  order.  This  minimization  problem  is  hard  to  solve,  so  we 
use  an  over-parameterized  PDF  model,  which  is  a  gaussian 
mixture,  and  minimize  the  FI  in  this  set.  A  new  parame¬ 
ter  estimation  method  is  given  and  its  robustness  properties 
are  detailed.  The  performances  of  the  resulting  identifica¬ 
tion  scheme  are  compared  to  those  of  another  higher  order 
method. 


1.  Introduction 

The  identification  of  the  parameters  of  a  discrete  linear 
shift-invariant  system  by  observation  of  its  ou5)ut  is  of  con¬ 
siderable  interest  in  time  series  and  spectral  analysis,  filter¬ 
ing  and  prediction.  In  non  gaussian  case,  numerous  methods 
based  on  higher  order  statistics  (HOS)  have  been  introduced 
due  to  the  fact  that  the  ou^ut  of  these  systems  carries  phase 
informaticHi.  Their  main  disadvantage  is  that  they  do  not 
provide  any  informatitm  about  the  theoretical  performances 
of  the  estimator  and  its  optimality  in  the  sense  of  the  covari¬ 
ance  matrix  of  the  estimated  parameters. 

To  obtain  an  optimal  estimator,  it  is  necessary  to  know  the 
exact  probability  density  function  (PDF)  of  the  input  noise 
in  order  to  calculate  the  maximum  likelihood  (ML)  estima- 
tOT.  If  the  PDF  is  unknown,  we  can  assume  a  certain  class 
of  PDFs  for  the  input  and  obtain  the  optimality  in  the  min¬ 
imax  sense  by  using  a  ML  ^proach  with  the  PDF  which 
minimizes  the  Fisher  information  (FT)  in  this  class  and  pro¬ 


vides  the  most  robust  (in  Huber’s  sense)  [3,  6]  parameter 
estimates. 

In  connection  with  higher  order  statistics,  we  consider 
the  class  of  cumulants  constrained  PDFs  and  determine  the 
PDF  which  minimizes  the  FI  under  cumulant  constraints. 
This  optimization  problem  was  partially  solved  in  [9]  but  the 
results  are  limited  to  the  symmetrical  sub-gaussian  PDFs. 
So  this  problem  is  always  open  for  super-gaussian  and  non 
symmetrical  PDFs. 

In  this  paper,  we  propose  a  new  parameter  estimation 
method  based  on  the  prediction  error  method  (PEM)  using 
cumulants  of  second,  third  and  fourth  order  and  the  mini¬ 
mization  of  the  FI.  We  use  a  model  of  PDF,  appropriate  for 
non  gaussian  processes  with  heavy  tails,  which  is  a  gaussian 
mixture  (GM)  PDF: 

fmiu)  =  pOi(u)  -I-  (1  -  p)<h.(.u),  0  <  p  <  1  (1) 

where  d>i  (w)  and  ^(w)  are  Gaussian  PDFs.  We  consider  the 
case  of  a  non  symmetrical  PDF  (see  the  symmetrical  case  in 
[4])  constrained  by  the  second,  third  and  fourth  order  cumu¬ 
lants,  used  in  practice. 

In  section  2,  we  present  the  procedure  for  the  determi¬ 
nation  of  the  set  of  centered  GM  PDFs  having  the  same 
variance,  skewness  and  kurtosis  (second,  third  and  fourth 
order  cumulants),  and  the  element  of  this  set  minimizing 
the  FI.  This  solution  is  given  analytically  for  the  null  skew¬ 
ness  case.  In  section  3,  the  proposed  parameter  estimation 
scheme  is  explained  and  the  robustness  properties  of  our  es¬ 
timator  are  given.  The  estimation  algorithm  and  simulation 
results  are  presented  in  section  4.  In  section  5,  a  conclusion 
is  given. 

2.  Model  for  the  input  PDF 

Here  we  introduce  the  cumulant  matched  GM  as  the 
model  for  the  input  distribution.  Despite  the  fact  that  it  does 
not  result  from  any  constrained  mini-  or  maximization  of 
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PDF  measure,  it  has  very  useful  characteristics  and  interest¬ 
ing  prq)erties  (see  [4]). 

So,  let  C2,  C3  and  C4,  respectively  the  variance,  the 
skewness  and  the  kurtosis  of  any  PDF  (C2  >  0,  (^4  > 
IC2  -  2C|).  We  will  show  that  there  exists  always  a  non 
empty  set  Fm  of  centered  GMs  (1)  having  these  cumulants. 
The  problem  is  which  mixture  to  choose  in  this  set,  when  it 
contains  more  than  one  element,  as  model  for  the  input  PDF. 
So  we  decide  to  take  the  mixture  model  of  Fm  minimizing 
the  FI,  i.e. 

/^  =  arg^minJ/  (2) 

where  //  is  the  H  defined  as 


Wfith  the  GM,  this  integral  can  be  evaluated  only  numeri¬ 
cally. 

CcMisider  mi ,  mi  and  Vi,Vi,  respectively  the  means  and 
variances  of  Oi(u)  and  ^{u)  in  the  GM  given  by  (1).  To 
obtain  a  centered  GM  with  given  variance  Ci,  skewness  C3 
and  kurtosis  C4,  we  must  have; 


(pmi  +  (1  —  p)m2  =  0  (a) 

p(Vi  +  mj)  +  (1  -  p)(V2  +  mf)  =  Ci  (b) 

pmi(3Vi  +  mf)  +  (1  —  p)m2(3V2  +  m^)  =  Cl  (c)  (4) 

p(3Vi  +  6mf  VI  +  mi)+ 

(1  -  p)(3lf  +  emlVi  +  m^)  -  3Cl  =  C4  (d) 

We  can  see  that  if  p  =  0  or  p  =  1,  the  PDF  /m(«)  (1)  is 
Gaussian  with  variance  C2  and  it  is  possible  only  if  C3  = 
C4  =  0.  So,ifC3^0orC'4^0then0<p<  1. 

Now,  the  problem  is  to  determine  the  set  of  solutions  of 
the  system  (4).  To  do  this,  we  use  a  new  parameterization: 

f  mi  =  *(1  -  p) 

\  m2  =  —kp 


where  k  is  real.  So,  the  relation  (4a)  is  always  verified  and 
mi  and  m2  can  take  all  values. 

For  convenience,  we  consider  now  that  k  ^0.k  =  0  case 
will  be  precised  later.  Then  in  using  (4b)  and  (4c),  we  get 
the  formula  of  the  variances  Vi  and  V2  in  function  of  p  and 


jfe: 


(a) 

(b) 


(6) 


Next,  by  replacing  (6)  in  (4b),  (4c),  (4d)  and  by  combination 
of  these,  we  obtain  the  equation  linking  p  and  k: 


p^(l  -  pf(p^  -  p  +  l)k^  +  p(l  -  p)(4p  -  2)  X 

Cik^  +  lp(l-p)CAk^-^  =  0 

This  equation  is  analytically  solvable  only  if  Cs  =  0  and  k  ^ 
0,  so  if  C4  <  0.  In  the  general  case,  we  solve  numerically 


(7)  in  order  to  determine  the  pairs  (p,  k),  with  0  <  p  <  1  and 
jfe  y  0,  solutions  of  (7).  We  can  remark  immediately  that,  if 
the  pair  (p,  k)  is  solution  of  (7),  then  the  pair  (1  -p,  -k)  too. 
And  the  two  resulting  mixtures  are  identical. 

If  jfe  =  0,  then  we  cover  the  class  of  symmetrical  super- 
gaussian  PDFs  studied  in  [4].  So,  it  ensues  the  following 
proposition: 

Proposition  1:  Let  C2,  C3  and  C4,  respectively  the  vari¬ 
ance,  the  skewness  and  the  kurtosis  of  any  PDF.  Then  there 
exists  a  non  empty  set  Fm  of  centered  GMs  having  these 
cumulants.  We  have  to  distinguish  three  cases; 

(i)  C3  4  0-  is  characterized  by  the  pairs  (p,  k), 

where  0<p<  landjfe>0,  solutions  of  the  equation 
(7)  and  for  which  the  variances  Vi  and  V2  of  the  two  Gaus¬ 
sian  PDFs  of  the  mixture  are  non  negative,  mi  and  m2  being 
given  by  (5).  Fm  is  noted  GMp^k- 

(ii)  Cs  =  0  and  64  <  0:  then  k^O.  Fm  is  characterized 
by  the  pairs  (pi ,  k)  and  (1  —  pi ,  k),  where 


and  (-8C4)?  <  k  <  to  have  0  <  pi  <  1  and  V^i 

and  V2  non  negative.  Fm  is  noted  SBGMk. 

(iii)  C3  =  0  and  C4  >  0  (super-gaussian);  then  k  =  0. 
Fm  is  characterized  by 

mi  =  m2  =  0 

Vi=  Cl -a 
'  V2  =  C2  +  ^ 

<  P  “  1+a^ 

Fm  is  noted  SGMa. 

Remark  1:  For  the  cases  (i)  and  (ii),  we  show  easily  that 
in  the  boundary  case  C4  =  C^fCi  —  2(71,  the  sets  GMp^t 
and  SBGMk  contain  an  unique  PDF  which  corresponds  to 
the  Bernoulli  distribution. 

Now  the  determination  of  the  PDF  (2)  of  Fm  which 
minimizes  the  FI  leads  to  the  following  results: 

Proposition  2:  The  GM  of  SBGMk  minimizing  the 
Fisher  information  (3)  is  f^,  where  Fm  =  SBGMk  in  (2), 
with  , 

mi  =  -m2  =  (-^y 

<  Vi  =  V2  =  G2-J^  (10) 

-  P=\ 

It  is  quite  natural  that  the  solution  be  the  symmetrical 
PDF  (sub-gaussian  PDFs  class)  of  the  set  SGMk  because 
it  is  the  more  Gaussian  of  this  set  (all  its  cumulants  of  odd 
order  are  null)  in  the  sense  that  the  minimum  value  of  the 
FI,  for  any  PDF,  is  reached  for  the  Gaussian  PDF  [7]. 

Proposition  3:  The  mixture  PDF  model  of  SGMa  mini¬ 
mizing  is  obtained  for  a  — >  0  and  then  Cilj^  — >  1. 
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Due  to  the  Proposition  3,  it  seems  that  the  model  of 
Proposition  1  (iii),  when  a  tends  to  0  but  is  not  0,  is  an 
e-approximation  of  the  solution  of  the  FI  minimization  un¬ 
der  constraints  of  C2  and  C4  for  the  class  of  super-gaussian 
PDFs  since  the  absolute  minimum  of  Czlf  is  1,  obtained  for 
the  Gaussian  PDF.  In  practice,  a  is  taken  small  enough  (see 
[3]). 


3.  ARMA  parameters  estimation 


Let  the  observed  process  {i/t }  be  modeled  as  the  output 
of  a  discrete  stable  linear  shift-invariant  system  with 
input  {et}: 

yt  =  H0„(z)et  (11) 

where 


Heo(z)  = 


A(z)C(z-^) 

B(z)D(z-^) 


(12) 


with  ao  =  60  =  Co  =  do  =  1  and 


0o=[ai...  an^  6i . . .  6„b  Cl . . .  c„c  di . . .  d„^f  (13) 


In  Huber’s  sense  [3,  6],  if  the  true  PDF  /e  belCHigs  to 
the  class  Fm,  we  obtain  the  most  robust  estimator  (16)  of 
the  estimator  class  generated  by  Fm  with  the  norm  ^(.). 
It  is  possible  to  show  that  the  proposed  estimator  6  (16)  is 
asymptotically  optimal  in  the  minimax  sense  for  the  partic¬ 
ular  class  of  PDFs  Fm-  Under  some  assumptions  (see  [7]), 
the  estimate  (16)  is  consistent  and  the  following  expressions 
hold 


r  y/N(0-eo)  ~  (a)  .,7. 

1  =  (6) 

On  the  other  hand,  we  find  that 

cov(9-0q)>N-W*  (18) 

Thus,  for  fe  =  fM,  the  asymptotic  covariance  V(J^,fe) 
of  the  proposed  estimate  (16)  reaches  the  lower  possible 
boundary  V* ,  which  depends  on  the  FI  of  and  on  9q.  Its 
calculus  is  detailed  in  [7].  For  other  fe  €  Fm,  the  asymp¬ 
totic  covariance  does  not  exceed  V*.  If  /«  ^  Fm,  only  the 
relation  (17a)  holds.  In  all  the  cases,  V(f°,  fe)  is  obtained 
theoretically  with  the  results  of  [7]  and  [5]. 


We  assume  that  all  the  roots  of  A{z)  and  B(z)  are  inside  the 
unit-circle  (causal  minimum  phase  part)  and  all  the  roots  of 
C(z)  and  D(z)  are  outside  the  unit-circle  (anti-causal  max¬ 
imum  phase  part).  The  input  {cj}  is  an  independent  identi¬ 
cally  distributed  (i.i.d.)  sequence  with  unknown  PDF  /e(w). 

Given  N  consecutive  samples  of  the  system  ouqrut  yt, 
t  =  l,...,N,we  want  to  estimate  the  actual  parameter  Oq. 
The  prediction  error  sequence  {wj(^)}  [5, 7]  is  related  to  the 
data  through 

wt(0)  =  Hg^{z)yt  (14) 

With  PEMs,  the  estimate  0  of  is  equal  to  0  which  mini¬ 
mizes  some  criterion  depending  on  the  sequence  of  predic¬ 
tion  errors 

1  ^ 

d(d)=j^Y.^(wt{0))  (15) 

<=i 

where  l{.)  is  a  scalar-valued  norm,  i.e. 

9  =  aigminJ(0)  (16) 

Like  fe(u)  is  unknown,  there  are  two  possibilities:  either 
choosing  a  norm  giving  satisfying  results  for  a  broad  class 
of  input  PDFs  (robustification),  or  estimating  fe(u)  from  the 
available  data.  Our  ^proach  is  nearest  of  the  robust  iden¬ 
tification  in  the  sense  where  we  take  the  PDF  f^  (2)  of  the 
class  Fm,  which  is  GMp^k  or  SBGMk  or  5GM„,  depend¬ 
ing  on  the  values  of  the  second,  third  and  fourth  order  cu- 
mulants  of  the  PDF  /e(«)  for  the  choice  of  this  class.  The 
criterion  to  minimize  is  J{0)  with  the  norm  i(w)  =  f{w)  = 
-  log[ f^(w)]  (ML  approach). 


4.  Algorithm  and  simulation  results 

Each  step  of  the  algorithm  consists  of  the  three  parts: 

1)  Estimate  the  cumulants  of  the  prediction  error  process 
wt  (14). 

2)  Calculate  the  model  f^  for  the  input  PDF  by  (2), 
based  on  the  estimated  cumulants  of  wt-  Following  the  val¬ 
ues  of  these  cumulants,  we  choose  between  the  numerical 
procedure  given  in  Proposition  1  and  the  models  (10)  or  (9) 
for  the  calculus  of  f^. 

3)  Find  the  minimum  of  the  criterion  (15)  (with  £(w)  = 
^(w))  in  the  search  direction  of  a  quasi-Newton  algorithm, 
calculated  with  the  input  model  f^ .  This  calculus  is  not 
detailed  here  but  it  is  similar  to  the  one  presented  in  [7]. 

In  the  initialization  phase  of  our  ML  approach,  any  4th- 
order  methods  can  be  used,  for  example,  the  l^-slice  algo¬ 
rithm  [8],  to  avoid  convergence  to  false  local  minima. 

To  demonstrate  the  asymptotic  efficiency  of  our  ML 
^proach,  we  made  many  simulations  with  a  non-causal 
ARMA  model  driven  by  different  symmetrical  (belcmging 
to  sub-  or  super-gaussian  class  of  PDFs)  or  non  symmetri¬ 
cal  input  noises.  We  took  a  model  in  [2]  and  inversed  its 
causal  real  pole  to  obtain  our  non-causal  model.  It  has  poles 
at  5  and  0.6179  ±  i0.5077  and  zeros  at  —0.7  and  2,  so  with 
transfer  function  given  by: 

H(z)  =  a  +  0.7z-^).(l-0.5z) 

^  ^  (1  -  1.2358^-1  +O.63960-2).(1  -  0.2z) 

The  algorithm  was  tested  by  simulation  on  the  presented 
model  considered  as  unknown  and  driven  by  three  differ- 
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Table  1.  ARMA  parameter  estimates  (N=2000, 


100  ^ 

flonte-Cario  runs). 

Input 

True 

1  ML  _ 1 

1  LS+maxliCl  || 

parameter 

Mean 

Std 

Mean 

Std 

01=  0.7000 

0.7018 

0.0246 

0.7003 

0.0267 

61  =-1.2358 

-1.2243 

0.0295 

-1.2266 

0.0412 

I 

62=0.6396 

0.6329 

0.0270 

0.6358 

0.0436 

ci=-0.5000 

-0.4756 

0.1329 

-0.4838 

0.1485 

(ii=-0.2000 

4).1865 

0.1525 

-0.1969 

0.1687 

01=  0.7000 

0.7046 

0.0231 

0.6996 

0.0255 

61  =-1.2358 

-1.2299 

0.0306 

-1.2267 

0.0408 

n 

62=0.6396 

0.6337 

0.0276 

0.6356 

0.0448 

ci=4).5000 

-0.4846 

0.1308 

-0.4932 

0.1361 

dl=4).2000 

-0.1862 

0.1443 

-0.2061 

0.1585 

ai=  0.7000 

0.7002 

0.0136 

0.7038 

0.0266 

61  =-1.2358 

-1.2336 

0.0162 

-1.2262 

0.0440 

m 

62=0.6396 

0.6369 

0.0175 

0.6360 

0.0430 

ci=-0.5000 

-0.4998 

0.0560 

-0.4839 

0.1514 

tii=4).2000 

-0.2030 

0.0624 

-0.1934 

0.1719 

ent  input  noises:  l£q)lacian  (I)(super-gaussian),  uni¬ 
form  (II)(sub-gaussian)  or  exponential  (HI).  100  indepen¬ 
dant  Monte-Carlo  runs  were  performed  for  each  simulation. 
The  signal’s  length  used  is  N  =  2000  samples.  We  com¬ 
pared  this  results  with  a  method  [1]  (noted  LS+maxIi^l), 
wh»e  the  spectrally  equivalent  minimum  phase  system  is 
primarily  identified  using  least  squares  method  (LS).  Then, 
among  all  the  spectrally  equivalent  systems,  we  choose  the 
model  which  maximizes  the  absolute  value  of  the  estimated 
normalized  kurtosis  of  the  innovation  process. 

In  Thble  1,  the  mean  and  the  standard  deviation  (Std) 
of  the  parameter  estimates  are  summarized  for  the  differ¬ 
ent  input  noises.  The  presented  results  show  the  good  be¬ 
haviour  of  our  method  compare  to  the  LS+maxlX  |  method 
with  smalls  bias  and  Std. 


means  to  find  the  analytical  form  of  the  PDF  in  the  case 

of  non  symmetrical  PDF. 
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5.  Conclusion 

A  possible  way  to  obtain  an  efficient  parameter  estimates 
in  case  of  unknown  non  gaussian  input  is  presented.  The  in¬ 
novation  of  the  prqx)sed  PDF  model  is  that  it  is  the  element, 
minimizing  the  FI,  of  a  set  of  GMs  having  the  same  four  first 
cumulants  than  the  true  input  PDF.  This  PDF  model  is  pa¬ 
rameterized  by  its  second  and  fourth  order  cumulants  for  the 
classes  of  null  skewness  PDFs,  and  determined  numerically 
for  the  non  zero  skewness  PDFs  class.  An  interesting  result 
has  been  obtained  in  the  super-gaussian  case  for  which  the 
PDF  model  (9)  seems  to  be  an  e-approximation  of  the  solu¬ 
tion  of  the  more  general  problem  of  FI  minimization  under 
constraints  of  C2  and  C4. 

Simulation  results  seem  to  confirm  the  good  behaviour 
and  robustness  of  our  method  compared  to  other  methods 
based  on  high^-oider  statistics.  Future  works  wiU  address 
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Abstract 

The  Minimum  Entropy  Method  is  studied  with  regard 
to  its  performance  in  removing  multipath  distortion  from 
passive  transients,  to  improve  the  performance  of 
classifiers.  It  was  found  that  the  method  often  works  well 
if  the  kurtosis  of  the  associated  multipath  Green's  function 
is  high  enough,  and  that  signal  stationarity  is  not  required. 
We  also  found  that,  while  there  are  usually  a  few  filter 
lengths  at  which  the  best  solutions  are  obtained  with 
conventional  convergence  criteria,  good  solutions  exist 
across  a  much  broader  range  of  filter  lengths  if  the 
iterations  are  not  allowed  to  proceed  to  convergence.  That 
is,  kurtosis  needs  to  be  increased,  but  not  maximized.  In 
many  cases,  two  or  three  iterations  is  sufficient. 

1.  Introduction 

The  passive  sonar  classification  problem  can  be 
decomposed  into  two  stages:  1)  recovering  the  source  time 
signature  of  a  transient  event  from  a  set  of  received 
signals  by  accounting  for  environmental  distortion  effects, 
and  2)  applying  a  pattern  recognition  algorithm  to  the 
estimated  source  signature  for  final  classification.  By 
environmental  distortion,  we  refer  to  effects  present  in  the 
received  data  at  the  sensor  array  that  are  not  present  in  the 
source  signature.  In  our  case,  environmental  effects  consist 
primarily  of  multipath  and  low  level  ambient  noise.  For  a 
spatial  point  source,  if  we  incorporate  the  environmental 
effects  into  a  Green's  function,  and  assume  time- 
invariance,  the  received  pressure  time  series  at  a  desired 
location  can  be  modeled  as  the  convolution  of  the  transient 
source  signature  with  the  Green’s  function,  A  term 
representing  additive  noise  effects  can  be  added  to  the 
convolution.  The  Green's  function,  of  course,  depends  on 
the  environmental  acoustic  parameters  and  the  source  and 
receiver  location. 

When  a  Green's  function  has  been  determined  by 
numerical  solution  of  the  wave  equation,  it  can  be  used  to 
deconvolve  the  measured  time  series  for  an  estimate  of  the 
source  signature,  which  is  referred  to  as  the  deterministic 
approach.  Broadhead  etal.  [1]  reviewed  this  approach,  and 
performed  an  additional  study  in  a  bottom-limited 


propagation  environment,  showing  that  there  was  extreme 
sensitivity  to  inaccuracy  in  the  bottom  geoacoustic 
parameters. 

Broadhead  [2]  used  a  statistical  source  estimation 
approach  to  address  the  problem  of  recovering  a  source 
signature  without  specific  knowledge  of  its  location,  or  the 
environmental  parameters  necessary  to  accurately  compute 
the  Green's  functions.  He  gave  examples,  for  the  single 
channel  case,  where  this  can  be  done  if  the  Green's 
functions  representing  environmental  distortion  are  lepto- 
kurtic  (a  specific  type  of  non-Gaussianity).  The  method 
used,  called  the  minimum  entropy  deconvolution  method 
(MED),  was  introduced  by  Wiggins  in  1977  [3].  This 
method  was  further  refined  and  interpreted  by  various 
researchers  (see  bibliography  in  Ref,  [4]).  The  goal  of  this 
method  is  to  produce  a  filter  that  drives  the  output  of  the 
system  to  lower  entropy  (greater  order),  or  equivalently, 
drive  the  governing  distribution  more  towards  non- 
Gaussianity  (higher  kurtosis).  The  success  of  MED 
depends  on  the  non-Gaussianity  of  the  input  random 
process,  but  apparently  does  not  require  stationarity 
(examples  are  given  in  [2]). 

In  this  paper  we  continue  the  work  begun  in  [2]  with  a 
more  thorough  and  systematic  exploration  of  the  solution 
space  provided  by  the  MED  parametric  method.  The  results 
in  this  paper  show  that  exploitation  of  higher  order 
parametric  methods  to  achieve  classification  performance 
gains  for  nonstationary  sonar  signals  appears  promising. 


2.  MED  Algorithm 

The  MED  algorithm  has  been  thoroughly  described  in 
the  literature,  and  will  not  be  repeated  here,  but  a 
minimum  of  terminology  must  be  defined.  We  seek  the 
MED  filter  /  of  length  N  that  is  a  stationary  point  of  the 
functional 


where 


(1) 

gj  =  . 

(2) 
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g  is  the  Green's  function  estimate,  x  is  the  input  signal 
and  V  is  the  Varimax  norm  (essentially  kurtosis).  The 
resulting  nonlinear  system  of  equations  is  solved 
iteratively.  A  starting  point  is  given  by  taking  /  as  a 
delta  function.  After  iterating  to  some  stopping  criterion 
(to  be  discussed),  we  are  left  with  the  filter  /  and  the 
Green's  function  estimate  | .  To  obtain  an  estimate  of  the 
source  signature  .v ,  we  calculate  the  inverse  of  / . 


1 

1|  il  a)  i 

• 

0.0  0.5  1.0  1.5  2.0 


0.0  0.5  1.0  1.5  2.0 

TIME  (s) 


Fig.  1.  Signal  type  examples  for  the  4300  m 
range,  (a)  Data,  (b)  PE  Green’s  function,  (c) 
Short  pulse  simulation,  (d)  Long  pulse 
simulation. 


3.  Signal  Description 

We  have  three  types  of  input  signals:  1 )  data,  2)  short 
pulse  simulations  (SPSIMUL),  and  3)  long  pulse 
simulations  (LPSIMUL).  The  data  analyzed  was  obtained 
in  an  experiment  conducted  in  the  Atlantic  Ocean,  in  the 
vicinity  of  Blake  Plateau.  For  details,  refer  to  Refs.  [I  |, 
[2],  and  [5].  A  typical  time  series  is  shown  in  Fig.  I  (a) 
(250  m  receiver  depth,  4.3  km  source-to-receiver  range). 
The  bottom  interacting  events  occur  after  about  0.4 
seconds. 

In  Fig.  I  (b)  we  show  the  corresponding  calculated  (PE 
model)  Green's  function.  We  used  this  and  the  two  pulses 
shown  in  Fig.  2  to  create  (by  convolution)  two  types  of 
simulations:  1)  a  short  simulation  representative  of  the 
data,  as  shown  in  Fig.  1(c),  and  2)  a  long  pulse 
simulation,  shown  in  Fig.  1(d),  that  creates  more  overlap 
between  the  various  arrivals.  We  have  displayed  only  the 
4.3  km  range,  but  have  also  processed  the  600  meter  and 
7.9  km  ranges.  Refer  again  to  the  above  references  for 
more  examples  of  time  series. 


In  Fig.  2,  as  mentioned,  we  display  the  two  pulse 
types.  In  2(a),  the  short  pulse  is  our  best  estimate  from 
measurements  from  a  source  array  mounted  hydrophone  of 
the  true  source  pulse  on  the  data.  The  longer  pulse  in  2(b) 
is  an  exponentially  damped  sinusoid. 


4.  Processing  Methodology 


We  developed  two  basic  processing  methodologies, 
which  will  be  referred  to  as  CONVRG  and  BEST. 
CONVRG  uses  a  conventional  convergence  criterion,  and 
the  output  is  a  correlation  coefficient  between  the  source 
estimate  and  the  known  source  at  each  filter  length  from  I 
to  50.  (The  correlation  coefficient,  given  by  the  symbol  y 
is  standard  except  that  we  always  report  it  as  the  absolute 
value).  The  convergence  criterion  was  as  follows:  the 
correlation  coefficient  is  calculated  between  the  current 
MED  filter  iterate  and  the  previous.  When  this  value 
exceeds  the  specified  tolerance,  the  iteration  is  stopped.  We 
used  a  tolerance  level  of  0.9999. 


Fig.  3.  Results  for  LPSIMUL,  600  m  case. 
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We  mimic  the  case  of  doing  no  preprocessing  before 
classification,  that  is,  just  correlating  the  received  signal 
with  the  source  signature.  This  value  gives  a  measure  of 
how  much  distortion  was  introduced  by  the  multipath,  and 
will  be  indicated  by  short  dashes  in  the  figures.  The  output 
of  CONVRG  will  be  indicated  by  a  solid  curve. 


.*  t  *  1  i  i  i...t «  I  I  t  I  I  I  t  ■ .  .  I  .  t  t .  ,  .  ,  ,  t .  ,xti.9 

10  20  30  40  50 

FILTER  LENGTH 


Fig.  4.  (a)  CONVRG/BEST  (Solid/Dash) 

results  for  SPSIMUL,  7900  meter  case,  (b) 
Number  of  iterations  for  results  in  (a). 

BEST  outputs  the  correlation  coefficient  between  the 
best  possible  source  estimate  and  the  known  signature  at 
each  yv  out  of  a  possible  itermax  iterations,  without  regard 
to  actually  trying  to  maximize  V.  itermax  was  variously 
either  30  or  40  iterations.  This  curve  will  be  represented 
by  long  dashes  in  the  figures.  In  both  cases,  the  number  of 
iterations  actually  performed  at  each  filter  length,  and  the 
estimated  Green’s  function  kurtosis  were  also  output. 

There  were  two  stages  in  both  algorithms  where  some 
regularization  could  be  required:  I )  on  a  given  iteration,  the 
Toeplitz  coefficient  matrix,  which  may  become  ill- 
conditioned,  and  2)  the  calculation  of  the  inverse  of  the 
MED  filter,  which  may  have  spectral  zeros  (a  frequency 
domain  method  was  used).  The  data  and  SPSIMUL  cases 
used  a  pre-whitening  value  of  0.01%  for  stage  I).  The 
4300  m  range  of  LPSIMUL  also  used  this  value.  No  pre¬ 
whitening  was  used  for  the  other  two  ranges.  In  no  case 
was  pre-whitening  used  for  the  inverse  filter.  A  definite 
sensitivity  to  the  amount  of  pre-whitening  was  noticed. 


5.  Results 

In  Table  I  we  summarize  the  results  in  the  form  of 
correlation  coefficients  between  the  estimated  and  known 
pulses.  For  the  different  signal  type  and  processing 
methodology  combinations,  only  the  highest  coefficient 


Range  (m) 

600 

4300 

7900 

DATA/ 

CONVRG 

0.896 

0.802 

0.822 

DATA/ 

BEST 

0.902 

0.865 

0.858 

SPSIMUL/ 

CONVRG 

0.975 

0.946 

0.864 

SPSIMUL/ 

BEST 

0.975 

0.958 

0.893 

LPSIMUI7 

CONVRG 

0.995 

0.963 

0.992 

LPSIMU17 

BEST 

0.995 

0.965 

0.993 

TABLE  I  Summary  of  highest  correlation 
coefficients  for  all  cases. 


obtained  is  reported  in  each  case.  In  most  cases  the  highest 
coefficient  obtained  for  BEST  and  CONVRG  were 
comparable.  They  were  significantly  different  at  other  filter 
lengths,  for  some  cases,  however,  which  we  will  discuss 
later.  LPSIMUL  results  were  typically  better  than 
SPSIMUL.  We  will  speculate  as  to  why,  also  later. 

In  all  cases,  the  best  results  were  a  significant 
improvement  over  doing  no  preprocessing  of  this  kind, 
where  the  ys  then  are  less  than  0.7.  As  would  be  expected, 
the  results  were  better  for  the  simulations  than  for  the  real 
measurements.  Not  only  did  the  data  have  some  noise,  but 
the  "true"  source  is  no  known  pulse  with  complete 
accuracy. 

In  Fig.  3,  we  display  results  for  LPSIMUL,  600  m. 
This  figure  demonstrates  that  conventional  convergence 
criteria  can  work  very  well.  This  also  happens  to  be  a  case 
where  the  associated  Green's  function  kurtosis  was  very 
high  (124.6).  The  resulting  simulated  data  kurtosis  (after 
convolution  with  the  long  pulse)  was  26.2. 

We  consider  a  short  pulse  simulation  example  in  Fig. 
4(a)  for  the  7900  m  range.  This  figure  shows  that  for  some 
cases,  beyond  a  certain  value  of  N,  the  algorithm 
performance  drops  off  significantly  (often  worse  than  doing 
nothing)  for  conventional  convergence  criteria.  However, 
the  long  dashed  line  (BEST)  shows  that  good  solutions  are 
available,  only  at  many  less  iterations  (often,  only  2)  than 
needed  to  maximize  V.  This  is  shown  in  Fig.  4(b),  where 
the  solid  curve  represents  the  number  of  iterations  required 
to  satisfy  the  conventional  convergence  criterion 
(CONVRG),  and  the  long-dash  curve  represents  the 
number  of  iterations  associated  with  the  best  possible 
source  estimate  in  40  or  less  iterations. 
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In  contrast  to  the  previous  case,  (he  initial  Orcen’s 
function  kurtosis  u^as  much  lower  (27.3).  Also,  the  change 
in  kurtosis  after  convolution  with  the  short  pulse  was  less 
(kurtosis  of  simulated  data  =  14.8).  Note  that  the  starling 
kurtosis  for  this  signal  is  comparable  to  the  final  kurtosis 
of  the  convolved  signal  in  the  example  in  Fig.  3.  These 
factors  are  probably  significant  for  determining  when 
conventional  convergence  criteria  will  or  will  not  work 
well. 

In  Fig.  5,  we  display  the  kurtosis  of  the  MFD  Green's 
function  estimates  produced  by  both  CONVRG  and  BEST, 
as  a  function  of  filter  length.  This  figure  is  fairly  typical 
of  the  results.  It  shows  a  steady  increase  in  the  final 
(maximized)  kurtosis  value  for  CONVRG,  but  a  level 
average  value  for  BEST,  indicating  that  the  most  accurate 
estimate  of  the  Green’s  function  is  not  necessarily  the 
estimate  with  the  highest  kurtosis. 

In  Fig  6,  we  show  pulse  estimates  for  the  highest  and 
lowest  correlation  coefficients  for  SPSIMUL/CONVRG. 
This  gives  a  rough  idea  of  the  visual  quality  of  match 
expected  for  the  range  of  correlation  coefficients  relevant  to 
this  study.  In  Fig.  6(a),  for  the  600  m  case,  y  “  0.975. 
In  Fig.  6(b),  y  =  0.864,  and  the  range  is  7900  m. 


Fig.  5.  Green's  function  estimate  kurtosis 
for  SPSIMUL,  7900  meter  case. 

6.  Discussion  and  Conclusions 

We  have  found  that  for  the  cases  studied  the  processor 
is  always  capable  of  providing  better  results  than  doing  no 
preprocessing  at  all.  In  most  cases,  the  best  results  were 
very  good  (y  >  0.9).  Even  when  the  processor  gave  good 
results  only  over  a  narrow  range  of  filter  lengths,  many 
good  solutions  were  still  available  at  other  filter  lengths, 
only  at  a  smaller  number  of  iterations  than  maximizing 
the  V  norm  would  require.  Since  our  goal  is  to  produce  a 
class  of  candidate  solution  signals  for  classification,  this  is 
useful  in  that  it  may  be  exploitable  in  increasing  the 
probability  of  having  a  "good”  solution  in  the  class  of 
signal  candidates,  albeit  at  the  expense  of  adding  a 
dimension  to  the  search  space. 


The  degree  of  kurtosis  possessed  by  the  Green’s 
function  representing  the  multipath  distortion  appears  to 
be  an  important  factor  in  the  quality  of  the  results.  The 
final  kurtosis  value  of  the  convolved  signal  may  also  be 
important,  which  remains  to  be  determined. 
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Fig.  6.  Best  and  worst  source  estimates  for 
SPSIMUL/CONVRG  a)  600  meter  range,  y  = 
0.975,  b)  7900  meter  range,  y=  0.864. 
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Abstract 

In  this  paper  we  introduce  a  general  distribution,  called 
the  Generalised  Bessel  K  (GBK)  distribution,  that  involves 
the  modified  Bessel  function  of  the  second  kind.  The  sta¬ 
tistical  properties  of  the  proposed  distribution  as  well  as  its 
application  to  coherent  modelling  of  radar  clutter  are  in¬ 
vestigated.  It  is  shown  that  the  GBK-distribution  includes 
a  large  number  of  the  well  known  clutter  models  and  is  still 
mathematically  tractable. 


1.  Introduction 

Detecting  targets  embedded  in  clutter  is  one  of  the  im¬ 
portant  tasks  for  a  radar  signal  processing  practitioner.  In 
parametric  detection  it  is  essential  that  the  clutter  be  under¬ 
stood  and  properly  modelled. 

In  many  applications  clutter  cannot  be  assumed  to  be 
Gaussian.  This  has  motivated  work  on  modelling  clutter 
by  non-Gaussian  probability  distributions.  In  practice,  two 
main  problems  are  apparent.  Firstly,  it  is  not  unusual  to  en¬ 
counter  the  data  which  is  incompatible  with  a  given  distribu¬ 
tion  at  hand.  Secondly,  for  optimal  (in  the  Neyman-Pearson 
sense)  detection  of  signals  in  coherent  and  correlated  clutter, 
multivariate  probability  density  functions  are  required  [2, 7]. 

To  overcome  these  problems  one  may  consider  modelling 
clutter  by  a  number  of  different  probability  models  and  use 
classification  techniques  for  determining  which  probability 
model  fits  the  given  data  most  closely  [10].  In  practice, 
however,  this  technique  is  difficult  and  involves  a  consid¬ 
erable  amount  of  computation.  Thus,  more  general  models 
for  clutter  that  lead  to  optimal  detection  are  sought. 

In  this  contribution  we  introduce  a  new  clutter  model  that 
involves  modified  Bessel  functions  of  the  second  kind.  We 
call  it  the  Generalised  Bessel  K  function  (GBK)  distribu¬ 
tion  unlike  the  other  generalisations  of  the  jK-distribution  [5] 
which  involve  modified  Bessel  functions  of  the  first  and  sec¬ 
ond  kind.  The  GBK-distribution  includes  a  large  number  of 
the  well  known  clutter  models.  At  the  same  time  it  is  math¬ 


ematically  tractable.  Application  of  the  GBK-distribution 
reduces  the  complexity  in  an  adaptive  radar  system  such  as 
the  one  proposed  in  [10],  since  only  one  clutter  model  needs 
to  be  employed. 

2.  Genesis  of  the  GBK-distribution 


A  if -distributed  random  variable  can  be  obtained  from 
the  multiplication  of  a  Rayleigh  and  a  Gamma  variate  [6]. 
Another  if -distribution  can  be  obtained  by  multiplying  an 
Exponential  and  a  Gamma  variate  [9].  Taking  into  account 
the  fact  that  the  Exponential  distribution  is  included  by  the 
Gamma  distribution,  the  if -distribution  can  be  generalised 
by  considering  a  distribution  that  originates  from  compound¬ 
ing  two  Gamma  distributions.  Teich  and  Diament  call  such 
a  distribution  the  if '-distribution  [9]. 

The  if '-distribution  can  be  generalised  further  by  not¬ 
ing  that  the  Gamma  distribution  is  included  by  the  gener¬ 
alised  Gamma  distribution  [8].  Thus  we  consider  a  distri¬ 
bution  which  originates  from  compounding  two  generalised 
Gamma  distributions.  The  two  component  distributions  may 
be  then  given  as 


fx\Y{x  I  y)  = 


cx 


,cai  —1 


j/<=“ir(ai) 


exp 


(1) 


and 


friy) 


cy 


,ca2“l 


^CQ2r(a2) 


exp 


(2) 


where  all  parameters  are  assumed  to  be  positive,  /(•  j  *) 
denotes  the  conditional  pdf,  and  r(-),  is  the  standard  Gamma 
function.  The  pdf  of  the  GBK-distributed  random  variable 
X  is  derived  using  the  integral  formula  [3,  p.  313,  Eq.  17] 
as 


fx{x) 


-  f 


fx\Y{x  I  y)fY{y)dy 


2c 

/3r(ai)r(a2) 

=  fx{x\ai,a2,P,c) 


(3) 


where  Kp{')  is  the  modified  Bessel  function  of  the  second 
kind  of  order  u. 


0-8186-7576-4/96  $5.00  ©  1996  IEEE 


226 


3.  Properties  of  the  GBK-distribution 

The  GBK-distribution  is  fully  characterised  by  the  four 
parameters,  0:1,  0L2,  and  c.  Note  that  ai  and  0.2  are  in¬ 
terchangeable  due  to  the  symmetry  property  of  the  modified 
Bessel  function.  The  parameters  ai,  0:2.  and  c  control  the 
shape  of  the  probability  density  function  while  the  parameter 
P  controls  its  scale.  In  Figures  1-3  the  probability  density 
curves  are  shown  where  one  parameter  at  a  time  is  varied. 


Figure  1.  Pdfs  for  a  GBK-distributed  variate 
with  ai  =  1, 02  =  2,  and  /?  =  1  for  four  different 
values  of  c. 


Figure  2,  Pdfs  for  a  GBK-distributed  variate 
with  oi  =  1, 02  =  2,  and  c  =  2  for  four  different 
values  of  p. 


Figure  3.  Pdfs  for  a  GBK-distributed  variate 
with  a2  =  2,p  =  l,  and  c  =  2  for  four  different 
values  of  oi. 


The  GBK-distribution  includes  a  large  number  of  distri¬ 
butions  which  occur  frequently  in  data  modelling.  In  Ta¬ 
ble  1  some  of  these  distributions  are  reproduced  with  the 


appropriate  parameter  selection.  The  notation  used  for 
each  of  the  distributions  corresponds  to  the  given  refer¬ 
ence  in  the  third  column  of  Table  1.  In  particular,  the 
GBK-distribution  includes  the  generalised  Gamma  distri¬ 
bution,  both  types  of  aforementioned  ff-distributions,  the 
/^'-distribution,  and  the  Jakeman  and  Tough’s  generalisa¬ 
tion  of  the  /C-distribution  derived  from  a  random  walk  in 
other  than  2  dimensions  [5,  Eq.  2.1 1]. 

In  radar  applications  it  is  desirable  that  the  Rayleigh  dis¬ 
tribution  (the  first  order  amplitude  distribution  of  a  complex 
Gaussian  process)  be  included  in  the  general  model  [2]. 
For  the  GBK-distribution  the  Rayleigh  distribution  is  in¬ 
cluded  for  two  sets  of  parameters,  5,l,2o',4)  and 

oo,2a,2).  Note  that  the  second  set  of  param¬ 
eters  has  a  limit  as  in  the  case  of  the  /(-distribution  [6].  On 
the  other  hand  the  first  set  of  parameters  is  finite.  Thus,  the 
reduction  of  the  GBK-distribution  to  the  Rayleigh  distribu¬ 
tion  is  numerically  more  stable  than  the  /(-distribution. 


Moments.  The  fcth-order  moment  of  the  GBK-distribution 
is  derived  using  [3,  p.  313,  Eq.  15]  as 


/•OO 

E[2(*’]  =  /  x'‘fx{x)dx 

r(ai  +  ^)r(a2-h^) 
r(ai)r(a2) 


(4) 


where  E[  ■  ]  denotes  the  expectation  operator. 


Spherical  Invariance.  In  order  to  design  optimal  detection 
schemes  one  needs  to  show  that  the  first  order  envelope  pdf 
given  by  (3)  fulfils  the  requirements  of  spherically  invariant 
random  process  (SIRP).  The  clutter  process  is  spherically 
invariant  if  its  iVth-order  pdf  can  be  given  as 

fx{x)  =  (27r)-'^|Mr‘''*/i2Jv(a:^M-'x),  (5) 

where  x  =  [xd  Xc2  ■  •  •  ^cN  Xgi  Xs2  •  ■  •  where  Xd 

and  Xsi,  i  =  1, . . . ,  iV  are  the  in-phase  and  quadrature  com¬ 
ponents  of  the  radar  clutter  process,  respectively,  M  is  the 
covariance  matrix  of  x,  and  /i2jv(  • )  is  a  suitable  function. 
Following  [2, 7],  we  derive 


h2N{q)  = 


r(ai)r(a2) 


5  (ct  1 +ot2  1 )  “  2N 


t,  [2(V5«)*1, 

(6) 


k=l 


where  q  =  x^M  ^x, 

r  (ai  +  f )  r  (02  -H  f ) _ /  1,  N-odd 

“ "  2r(ai)r(a2)  ’  \  0.  N-even  ’ 

and  the  coefficients  P(N,k)  are  calculated  recurrently 

P(N,k)  =  P(N-l,k)  C(N,k)  +  P(S-l,k-l), 
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with 


{0,  k>N 

1,  k  =  N 

+  k<N, 

^(0,0)  =  1*  ^(N,o)  =  0»  and  P(o,k)  =  0.  The  result 
given  in  Eq.  (6)  satisfies  the  monotonicity  condition  for 
aiC  <  2.  Thus,  the  GBK-distribution,  can  be  used  for 
coherent  modelling  of  clutter  when  aiC  <  2,  Such  a  result 
enables  us  to  determine  which  of  the  nested  distributions 
can  represent  a  first  order  amplitude  distribution  of  an  SIRR 
The  conditions  for  this  representation  are  given  in  the  last 
column  of  Table  1. 


4.  Parameter  Estimation 


The  practicality  of  the  GBK-distribution  requires  the  es¬ 
timation  of  its  parameters.  The  derivation  of  maximum  like¬ 
lihood  estimates  of  the  parameters  of  the  GBK-distribution 
is  cumbersome  [4].  Therefore,  a  feasible  alternative  for  es¬ 
timating  the  parameters  c,ai,a2,  and  /?  in  (3)  is  the  one 
based  on  higher  order  moments.  Using  (4)  one  can  estimate 
the  required  parameters  using  any  four  sample  moments  of 
the  data  including  fractional  moments.  In  general,  the  esti¬ 
mation  procedure  has  to  be  performed  numerically  in  a  four 
dimensional  parameter  space. 

Estimation  is  greatly  simplified  if  the  parameter  c  is 
known.  In  such  a  case,  the  parameters  ai ,  a2,  and  /3  can  be 
obtained  using  the  set  of  three  different  moment  ratios 


E\X^*  c] 

Rki  =  “  (ofi+fc— I)(a2-I-A:i— 1)/5  ,  i  =  1,2,3. 

(7) 

Letting  A;i  =  p,  A:2  =  q,  ks  =  r,  and  replacing  the  moments 
by  their  sample  counterparts  one  obtains 


and 


p(^  —  Rq)  +  q{^  —  -h  r{Rq 
{q-p)(p-r){r-q) 


B^) 


(8) 


da, 2  =  ^[(r  -  1)^  -  (p  -  in  +  k,[{q  ^1)^  +  (r  -  1)^] 

2[Rp{r  -  g)  -h  Rq(p  -  r)  -|-  ^(g  -p)] 

2[4(r  -  «)  +  4(p  -  r)  +  ^(9  - p)]  ’ 

where 


Z  =  -  r)^  f^ip  ~  r)^  F^{p  -  qY 

-  24,^[/-2r"(p  +  g)  +  rV+4pg-hg") 

-  2r (pg^  H-  p^g)  +  p^g^] 

-  2^^|p^  —  2p^(g -h  r) -f-p^(g^  +  4gr -hr^) 

-  2p(gr^  -f-  q^r)  +  g^r^] 

-  2RpFLr[q^  —  2g^(p  +  r)  +  g^(p^  -I-  4pr  +  r^) 

-  2g(pr^  +  p^r)  -h  p^r\ 


Taking  into  consideration  the  fact  that  estimates  based  on 
higher  order  moments  show  large  variability,  it  is  of  interest 
to  estimate  the  parameters  fi-om  lower  order  moments  by 
appropriately  selecting  the  parameters  fci,  A;2,  and  ^3.  Thus, 
one  can  use  fractional  moments,  i.e.,  of  order  other  than  a 
positive  integer. 

This  estimation  technique  can  be  extended  to  the  case 
where  c  is  unknown  as  follows. 

STEP  1 .  Set  the  parameter  c  to  co  >  c  (via  initial  guess). 

STEP  2.  Calculate  the  estimates  of  ai,  q2,  and  p. 

STEP  3.  Repeat  STEP  2  with  ci  =  co  ~  s,  where  s 
_ Is  a  step  until  ai,  02,  and  p  are  positive. _ 

In  the  algorithm  we  take  into  consideration  the  fact  that  for 
Co  greater  than  the  true  value  of  c  the  obtained  estimates 
of  ai,  a2,  and  P  may  be  negative  or  complex.  Simulation 
results  for  A:i  =  1.001,  k2  =  1.05,  ^3  =  1,  and  cq  =  6 
are  given  in  Tables  2-4  where  the  parameters  selections  are 
equivalent  to  the  ones  in  Figures  1-3,  respectively.  Averages 
were  computed  over  500  independent  trials  in  each  case. 

From  the  results  one  concludes  that  there  exists  a  strong 
relationship  between  the  estimates  of  P  and  c,  a  weak  rela¬ 
tionship  between  the  estimates  of  a2  and  c  and  practically 
no  relationship  between  the  estimates  of  ai  and  c.  Also, 
there  exists  a  relationship  between  the  estimates  of  ai  and 
a2  which,  as  mentioned  earlier,  are  interchangeable.  An 
attempt  has  been  made  to  improve  upon  p  when  c  <  2  by 
using  the  maximum  likelihood  based  result 

1  ^  1  1 
—  ^log(xi)-r(4'(di)  +  $(d2))  , 

i=i 

where  ^(•)  is  the  digamma  function,  but  no  significant 
changes  in  the  estimates  were  observed  when  compared  to 
the  ones  obtained  using  moments. 

5.  Conclusions 

A  general  distribution,  called  GBK-distribution,  was  de¬ 
rived  for  coherent  modelling  of  radar  clutter.  The  GBK- 
distribution  is  completely  characterised  by  the  set  of  four 
parameters.  It  includes  a  large  number  of  popular  distribu¬ 
tions,  being  at  the  same  time  mathematically  tractable.  The 
estimation  of  the  parameters  of  the  GBK-distribution  was 
investigated  and  an  estimation  method  based  on  moments 
was  proposed. 
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Table  1.  Special  Cases  of  the  GBK- 
distribution,  c). 


Type^ 

pdf 

Ref. 

SIRP 

X 

fx(x 

^,^.2.4) 

f,4^.2.2) 

1,1, 2, 4) 

2> 

[8] 

n<  2 

fx(x 

[8] 

71  <  4 

CG 

fx{x 

[8] 

Yes 

exp 

fx{x 

[8] 

Yes 

r 

fx{x 

[8] 

i/<2 

GF 

fx{x 

|,i^,a2l/^2p) 

[8] 

pv  <2 

GHG 

fx{x 

i,i^,a2l/^2«.) 
1. 1,2,4) 

[1] 

Yes 

GHL 

fx{x 

[1] 

Yes 

HG 

fxix 

[8] 

Yes 

J&T 

fx{x 

t,«,f,2) 

[5] 

n<2 

K 

fx{x 

1,  z/  +  1, 2a,  2) 

[6] 

Yes 

Ko 

fx{x 

1,1,JV,1) 

[9] 

Yes 

K 

fx{x 

[9] 

Yes 

K’ 

fx{x 

[9] 

P<2 

R 

fx{x 

|,1,2<t,  4) 

[8] 

Yes 

R 

fx(x 

1,1/  -y  oo,  2a,  2) 

[6] 

Yes 

SG 

fx{x 

f,  1,2,4) 

[8] 

No 

W 

fx(x 

i,l,a2‘/P,2p) 

[8] 

p<2 

^CG-Circular  Gaussian,  F-Gamma,  GF-Generalised  Gamma,  GHG- 
Generalised  Half  Gaussian,  GHL-Generalised  Half  Laplace,  HG-Half 
Gaussian,  J&T— Jakeman  &  Tough’s  model,  R— Rayleigh,  SG-Spherical 
Gaussian,  W-Weibull. 


Table  2.  Sample  mean  and  variance  of  the  pa¬ 
rameter  estimates  of  a  GBK  variate  with  ai  =  1, 
a2  =  2,  /?  =  1,  and  varying  c  for  iV  =  500. 


■ 

J 

1 _ 9:2 _ J 

c 

E[ai] 

Var[ai] 

E[&2] 

Var[d2]  1 

ai 

BiihonB 

1.2391 

Q 

1.2575 

Q 

R ' 

1.3342 

Bl 

B 

1.5599 

0.3208  1 

HI 

Bl 

mM 

lliil 

ni 

2.1744 

WSiSSM 

1.3421 

2.2838 

Q 

1.1844 

0,0643 

3,3874 

0.3285 

B 

1.0613 

0,0189 

5.3943 

Table  3.  Sample  mean  and  variance  of  the  pa¬ 
rameter  estimates  of  a  GBK  variate  with  ai  =  1, 
02  =  2,  c  =  2,  and  varying  for  AT  =  500. 


1  _  &i _ J 

1  «2  1 

p 

E[ai] 

Var[di] 

E[a2] 

Var[d2] 

T 

1.0941 

0.1623 

1.2381 

0.1881 

1.1115 

0.1690 

1.2514 

0.1974 

1 

1.1262 

0.1785 

1.2836 

0.1967 

2 

1.1366 

0,1646 

1,2813 

0.1843 

1  _ U 

1 _ ^ _ 

p 

m 

Var[/3] 

E[c] 

Var[c] 

T 

0.4549 

0.0204 

2.3038 

0.1903 

0,6756 

0.0484 

2.2894 

0.1927 

1 

1.3234 

0.2123 

2.2705 

0.2158 

2 

2.6448 

0.7808 

2.2592 

0.1938 

Tabie  4.  Sampie  mean  and  variance  of  the  pa¬ 
rameter  estimates  of  a  GBK  variate  with  ai=  2, 
=  1,  c  =  2,  and  varying  oi  for  N  =  500. 


!~l 

I 

1  02  1 

191 

E[di] 

E[d2] 

Var[d2]  1 

■jiTgn 

U 

IE  9 

1.1983 

■nffRB 

2.0831 

1.5862 

2.3401 

2.1653 

H 

2.4552 

3.0989 

2.7858 

4.0916 

I^B 

1  P  •'! 

1 _ E _ 

ai 

E[4] 

Var[4] 

lEil 

Var[c] 

1 

1.8653 

0.0670 

2.7534 

0.1080 

1 

1.4043 

0.1748 

2.3418 

0.1853 

2 

1,1210 

0.3175 

2.0974 

0.2458 

3 

1.2372 

0.4245 

2.1504 

0.2786 
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Abstract 

This  paper  deals  with  robust  estimation  of  AR  parame¬ 
ters.  We  compare  the  performance  of  the  LMS  algorithm 
to  the  performance  of  two  robust,  adaptive  algorithms:  the 
LMAD  algorithm  of  Shao  and  Nikias  in  which  the  error 
signal  in  the  LMS  algorithm  is  hard-limited  before  used 
to  control  the  weights,  and  the  LLMS  algorithm  in  which 
the  input  process  is  soft-limited  before  the  LMS  algorith- 
m  is  applied.  The  comparison  is  done  in  terms  of  rate  of 
convergence  and  stability  (steady  state  variance).  We  show 
that  with  a  proper  choice  of  limiting  level,  the  LLMS  algo¬ 
rithm  outperforms  the  LMAD  algorithms  when  applied  to 
symmetric,  a  stable  processes  of  I  <  a  <2. 


1.  Introduction  and  background 

Recently,  there  is  an  increasing  interest  in  signal  pro¬ 
cessing  of  non-Gaussian  processes.  In  general,  a  zero-mean 
symmetric  distribution  deviates  from  the  normal  distribution 
either  by  its  local  properties  (about  the  origin)  or  by  the  fact 
that  the  tails  of  the  probability  density  function  (pdf)  decay 
slower  than  the  tails  of  the  Gaussian  pdf.  While  local  fea¬ 
tures  of  the  pc?/  are  very  sensitive  to  the  presence  of  additive, 
Gaussian  noise,  the  heavy  tails  signals  preserve  their  nature 
even  in  the  presence  of  such  noise.  In  this  paper  we  are 
focused  on  heavy-tails  non-Gaussian  processes,  which  re¬ 
flect  rare  but  strong  values  in  the  signal,  i.e.,  impulsive-like 
processes. 

The  traditional  approach  to  robust  signal  processing  in 
the  presence  of  impulsive  noise  involves  passing  the  input 
through  a  non-linear  device  (such  as  a  limiter)  prior  to  the 
conventional,  second  order  processing  (e.g.,  [1]).  Alterna¬ 
tively,  Shao  and  Nikias  [3]  suggest  to  model  a  zero-mean, 
symmetric  heavy  tails  process  as  an  a-stable  process  [2] 
and  to  match  an  optimal  procedure  to  the  underline  distri¬ 
bution.  In  this  paper  we  study  the  differences  between  the 
two  approaches  by  comparing  the  performance  of  the  algo¬ 


rithms  based  on  two  approaches  for  adaptive  estimation  of 
AR  parameters. 

Consider  the  first  order  AR  process: 

x{n)  =  ax{n  -  1)  +  v{n)  (1) 

where  a  is  a  constant  to  be  estimated  and  v(n)  is  a  heavy- 
tails,  zero-mean,  symmetric  process.  We  compare  the  per¬ 
formance  of  3  adaptive  algorithms  for  estimating  a: 

•  The  LM5  algorithm  [4]: 

In  this  algorithm,  which  is  optimal  for  Gaussian  pro¬ 
cesses,  the  estimate  of  the  Ait!  parameter  a,  is 

the  steady  state  solution  of  the  difference  equation: 

'^LMsin  +  1)  =  WLMs(n)  +  px{n  -  l)e(n)  (2) 

where  e(n)  =  x{n)  ~  WLMs{'i^)x{n  —  1)  and 
'^LMsiO)  =  0. 

•  The  LLMS  algorithm: 

Here  the  estimate  of  a,  clllms.  is  the  steady  state 
solution  of  the  difference  equation: 

'^LLMs{'ti-\-\)  =  WLLMs{n)ApxL{n-\)e(n)  (3) 

where  e(n)  =  XL{n)  —  WLLMs{'ti)xL{n  —  1)  and 
^LLMs(O)  =  0.  XL(n)  is  the  limited  input  signal, 
i.e.,  x{n)  after  passing  through  a  limiter. 

•  The  LMAD  algorithm  [3]: 

In  this  algorithm,  the  estimate  of  a,  olmad.  is  the 
steady  state  solution  of  the  difference  equation: 

=  'u^LMAD(n)+Aiar(n— l)sz^n{e(n)} 

where  e{n)  —  x{n)  —  WLMAD{'ti)x{n  —  1)  and 
^LMAp(O)  =  0.  This  algorithm  is  based  on  modeling 
the  signal  as  a  symmetric  a-stable  signal  and  matching 
an  algorithm  to  the  assumed  distribution.  Note  that  the 
LMAD  algorithm  of  [3]  suggests  hard-limiting  the 
error  signal  in  the  usual  LMS  algorithm  (2). 
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Table  1.  Percentage  of  samples  truncated  by 
a  limiter  for  various  a  and  limit  values 


I  alplui  j  c=l  I  3.U.  I  lU  3l.fi  I  MX)  I  316.2  |  HXK)  |  3162.2  ~| 


1.0 

50.0248 

6.3325 

0.0288 

0.0623 

0.18)61) 

0.188)5 

0,188)1 

0 

l.l 

49.6025 

4.8290 

0.3753 

0.0297 

0.18)24 

0.188)2 

0 

0 

1.2 

49.2903 

3.5778 

0.2190 

0.0142 

0.18)12 

0.188)1 

0 

1) 

1.3 

49.1153 

2.6348 

0.1299 

0.18)70 

0.188)4 

1) 

0 

1) 

1.4 

48.9080 

1 .8972 

0.0720 

0.18)25 

0 

1) 

0 

1) 

1.5 

48.6815 

1.3296 

0.0411 

0.18)16 

0 

1) 

0 

0 

1.0 

48.5489 

0.8810  : 

0.0219 

0.188)5 

0 

0 

1) 

0 

1.7 

48.4386 

0.5553 

0.01 14 

0.188)2 

1) 

0 

1) 

0 

\M 

48.2056 

0.3037 

0.18)42 

0 

1) 

0 

1) 

0 

1.9 

48.0650 

0.1295 

0,18)17 

0 

0 

0 

0 

0 

2.0 

47.9901 

0 

0 

0 

0 

0 

0 

0 

The  LMS  algorithm  is  well  known.  The  LMAD  algorithm 
is  deeply  discussed  in  [3].  We  discuss  the  LLM5  algorithm 
in  Section  2.  In  Section  3  we  present  the  results  of  the 
comparison  between  the  3  algorithms  and  discuss  them. 

2.  The  LLMS  algorithm 


Table  2.  Steady  state  value  and  variance  of 
the  LLMS  for  various  a  and  c 


c 

a  =  1.0 

a  =  1.3 

a=  1.5 

a  =  1.7 

a  =  2.0 

1.0 

0.968614 

0.947430 

0.941695 

0.932215 

0.888754 

(0.003705) 

(0.004636) 

(0.005341) 

(0.006189) 

(0.007360) 

3.2 

0.965060 

0.939276 

0.926273 

0.919618 

0.866334 

(0.003293) 

(0.003610) 

(0.003585) 

(0.003737) 

(0.004694) 

10.0 

0.957781 

0.931475 

0.917954 

0.919880 

0.895272 

(0.002835) 

(0.002261) 

(0.002955) 

(0.002124) 

(0.002302) 

31.6 

0.943861 

0.930466 

0.933893 

0.945618 

0.945512 

(0.002202) 

(0.001670) 

(0.000897) 

(0.000408) 

(0.000160) 

100. 

0.942769 

0.955895 

0.962565 

0.973128 

0.945521 

(0.001475) 

(0.000315) 

(0.000326) 

(0.000035) 

(0.000160) 

316.2 

0.961561 

0.976227 

0.981633 

0.977140 

0.945521 

(0.000371) 

(0.000418)  : 

(0.000007) 

(0.000018) 

(0.000160) 

1000.0 

0.979454 

0.969481 

0.982322 

0.977140 

0.945521 

(0.000047) 

(0.003460) 

(0.002821) 

(0.000018) 

(0.000160) 

3162.3  ; 

0.986577  : 

0.956853 

0.982322 

0.977140 

0.945521 

(0.000778) 

(0.028933) 

(0.002821) 

(0.000018) 

(0.000160) 

10000.0 

0.943689 

0.946341 

0.982322 

0.977140 

0.945521 

(0.017883) 

(0.018018) 

(0.002821) 

(0.000018) 

(0.000160) 

The  LLMS  algorithm  presents  the  traditional  approach 
for  robust  signal  processing  of  impulsive-like  signals,  where 
rare,  strong  values  are  replaced  by  a  pre-fixed  value.  In  this 
paper  we  suggest  the  use  of  a  soft  limiter,  so 

XL{t)  “  x{t)  ;  \t\  <c 

XL{t)  ~  Cl  ;  \A>  ^  (^) 

where  c  determines  the  limiting  range  and  cx,  determines  the 
limiting  value.  The  LMS  algorithm  is  then  applied  to  the 
limited  input,  xi{n).  The  main  question  is,  how  to  choose 
the  values  of  c  and  of  c/,?  To  study  the  effect  of  a  limiter, 
we  have  simulated  zero-mean,  symeteric  a-stable  processes 
of  unit  covariation. 

Fig.  1  presents  the  average  of  the  learning  curves  (with 
the  same  /x)  of  200  runs  of  a  processes  with  a  =  1  for 
which  lollms  of  (3)  as  a  function  of  n  is  plotted.  For 
all  different  values  of  c,  fx  is  chosen  such  that  the  rate  of 
convergence  of  the  algorithm  is  maximized  while  keeping  its 
steady  state  variance  smaller  than  a  given  value.  It  shows  that 
the  rate  of  convergence  of  the  LLMS  algorithm  increases 
as  c  decreases.  Table  1  presents  the  percentage  of  samples 
truncated  by  a  limiter  for  various  values  of  a  and  of  c, 
where  cl  =  0.  It  shows  that  for  c  >  300,  the  number  of 
truncated  samples  is  practically  negligible  for  any  \  <  a  < 
2.  The  translation  of  this  observation  to  the  design  of  the 
LLMS  algorithm  is  not  straight-forward  since  the  effect 
of  truncation  on  a  sufficient  statistic  for  estimating  the  AR 
parameters  of  a  sequence  is  not  simple.  In  Table  2  we  present 
the  mean  and  the  variance  of  the  steady-state  estimate  of  the 
AR,  parameter  a  -  0.99  using  the  LLMS  algorithm  for 
different  values  of  c  (here  cl  -  0).  We  derive  the  statistics 
of  the  estimate  based  on  100  Monte-Carlo  runs,  at  each  of 
them  3000  samples  were  used,  which  are  significantly  more 


than  the  acquisition  time  of  the  algorithm,  as  shown  in  Fig. 

l .  It  shows  that  at  all  values  of  1  <  a  <  2,  the  steady  state 
performance  is  the  best  for  300  >  c  >  100. 

Therefore,  the  well-known  trade-off  between  speed  of 
reaction  and  stability  is  also  preserved  in  the  LLMS  algo¬ 
rithm.  The  best  choice  of  the  parameter  c  which  controls 
this  trade-off  by  maximizing  the  rate  of  convergence  without 
hurting  much  the  steady-state  performance  of  the  algorith- 

m,  seems  to  be  c  ==  100.  Note,  however,  that  this  value  is 
applicable  to  processes  of  unit  covariation.  For  processes 
with  different  covaration,  the  value  of  c  should  be  scaled 
accordingly. 

The  choice  of  ci,  depends  on  the  application.  In  our  case, 
we  try  to  match  an  AR  model  to  the  data.  Since  a  first 
order  AR  process  is  unlikely  to  have  very  strong  values,  it 
is  reasonable  to  suppress  them  by  letting  cl  be  the  median 
of  the  data  (in  our  case,  cl  ~  0).  Indeed,  our  simulation 
study  shows  that  as  cl  decreases,  the  steady  state  variance 
of  the  LLMS  algorithm  decreases  (see  Fig.  2).  In  other 
applications,  where  the  rare,  strong  values  may  better  fit  the 
problem,  it  may  be  more  reasonable  to  assume  ct  =  cor  an 
in-between  value  as  ci  =  0.5c. 

3.  Comparison  of  the  3  algorithms  and  discus¬ 
sion 

We  have  simulated  the  3  algorithms  for  a  =  0.99  where 
x{n)  is  a  symmetric,  normalized  (to  unit  covariation)  a- 
stable  signal  of  3000  samples.  For  the  LLMS  algorithm 
we  have  determined  ci  —  0  and  c  =  100,  following  the  dis¬ 
cussion  in  Section  2.  The  performance  of  the  algorithms  is 
evaluated  from  100  Monte-Carlo  runs.  In  Fig.  3  we  present 
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alpha=1.0 

1.2 1 - ^ ^ - , - 


Figure  1.  The  learning  curves  (averaged  over 
200  runs  with  the  same  /x)  of  the  LLMS  al¬ 
gorithm  with  different  limiting  levels  (c  = 

1 , 3, 10, 30,  100, 300, 1000, 3000, 10000,  cl  =  0). 

The  input  is  a  symmetric  a-stable  process  of 

a  —  1. 


the  average  of  1 00  learning  curves  of  the  3  algorithms  for  an 
a-stable  process  with  a  ~  1 .5,  built  as  for  the  experiment  of 
Fig.  1.  It  confirms  that  with  a  heavy-tails  signal,  the 
outperforms  the  LMS  significantly,  as  first  suggested  in  [3]. 
Note,  however,  that  if  /x  is  chosen  under  the  same  criterion 
(to  maximize  the  rate  of  convergence  of  the  algorithm  while 
keeping  its  steady  state  variance  smaller  than  a  given  value) 
for  each  run  separately  and  then  the  learning  curves  of  the 
100  runs  are  averaged,  the  advantage  of  the  LMAD  over 
the  LMS  is  dramatically  less  significant.  (See  Fig.  4). 

The  difference  between  Fig.  3  and  Fig.  4  is  explained 
by  the  effect  of  a  rare,  large  value  of  x{n)  on  the  LMS 
algorithm.  For  a  given  the  presence  of  such  large  value 
slows  down  the  convergence  of  the  algorithm  dramatically. 
When  /X  is  adjusted  to  each  sequence  separately,  the  spread 
of  the  /xs  over  the  1 00  a-stable  sequences  can  be  shown  to 
be  large,  but  the  averaged  learning  curve  converges  much 
faster  than  that  of  the  averaged  algorithm  (with  the  same  /x) 
and  is  much  smoother. 

In  Fig.  5  we  present  the  equivalent  of  Fig.  3  for  the  case 
where  the  input  happens  to  be  Gaussian  (a  =  2).  It  shows 
that  while  the  LMAD  algorithm  is  worse  than  the  LMS 
algorithm,  the  LLMS  algorithm  works  similarly  well  to  the 
LMS  for  Gaussian  data. 

That  is,  with  heavy-tails  signals,  where  the  LMS  algo¬ 
rithm  fails,  the  LLMS  algorithm  outperforms  the  LMAD 
algorithm  (but  not  significantly)  in  both  rate  of  convergence 


Figure  2.  The  steady  state  learning  curves  (av¬ 
eraged  over  200  runs  with  the  same  /x)  of  the 
LLM5  algorithm  with  different  limiting  levels: 
Cl  =  0, 100  (c  =  100).  The  input  is  a  symmetric 
a-stable  process  of  a  =  1.3. 


and  stability,  independent  of  the  presentations  used  (Fig.  3 
and  Fig.  4).  With  Gaussian  data,  the  LMS  algorithm  out¬ 
performs  the  LMAD  but  not  the  LLMS.  Therefore,  the 
LLMS  algorithm,  when  properly  designed,  is  better  than 
the  LM  AD  in  handling  heavy-tails  signals  in  terms  of  ro¬ 
bustness:  its  performance  is  as  good  as  that  of  the  LMAD 
when  applying  to  heavy-tails  signal  while  performing  as 
good  as  the  LMS  (and  better  than  the  LMAD)  with  Gaus¬ 
sian  signals. 

The  advantage  of  the  LLMS  over  the  LMAD  algorithm 
can  be  explained  by  comparing  equations  (3)  and  (4)  to  (2). 
In  the  LMS  algorithm,  the  adaptation  is  controlled  by  the 
error  signal  e{n)  -  if  the  error  is  large  the  adaptation  step  is 
larger,  so  it  converges  faster  to  the  region  of  small  errors, 
where  the  fine  adaptation  is  done.  By  hard-limiting  the  error 
signal  in  the  LMAD  algorithm  one  looses  this  automatic 
weighting  of  the  adaptation  step.  The  LLMS  algorithm,  on 
the  other  hand,  keeps  this  feature  while  handling  spikes  by 
limiting  the  dynamic  range  of  the  input. 
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Figure  3.  The  learning  curves  (averaged  over 
100  runs  with  the  same  n)  of  the  LMS,  the 
LMAD  and  the  LLMS  algorithms  (with  c  = 
100,  Cl  =  0).  The  input  is  a  symmetric  a- 
stable  process  of  a  =  1.5. 


References 

[1]  s.  Kassam.  Signal  Detection  in  Non-Gaussian  Noise. 
Springer,  New  York,  1988. 

[2]  G.  Samorodnitsky  and  M.  Taqqu.  Stable  Non-Gaussian  Ran¬ 
dom  Processes:  Stochastic  Models  with  Infinite  Variance. 
Chapman  and  Hall,  New  York,  1994. 

[3]  M.  Shao  and  C.  Nikias.  Signal  processing  with  fractional 
lower-order  moments:  Stable  processes  and  their  applications. 
Proceedings  of  the  IEEE.,  81:986—1010,  July  1993. 

[4]  B.  Widrow  and  S.  Stearns.  Adaptive  Signal  Processing.  Pren¬ 
tice  Hall,  Englewood  Cliff,  New  Jersey,  1985. 


alpha=1 .5  [avg] 


Figure  4.  The  averaged  learning  curves  (aver¬ 
aged  over  1 00  runs  with  optimizing  n  for  each 
run)  of  the  LMS,  the  LMAD  and  the  LLMS 
algorithms  (with  c  =  100,  Cl  =  0).  The  input 
is  a  symmetric  a-stable  process  {a  =  1 .5). 


alpha=2.0 


Figure  5.  The  averaged  learning  curves  (av¬ 
eraged  over  1 00  runs  with  the  same  /j)  of  the 
LMS,  the  LMAD  and  the  LLMS  algorithms 
(with  c  =  100,  Cl  =  0).  The  input  is  a  symmet¬ 
ric  a-stable  process  of  a  =  2,  i.e.,  a  Gaussian 
process. 
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Abstract 

In  the  frequency  estimation  of  sinusoidal  signals 
observed  in  impulsive  noise  environments,  techniques 
based  on  Gaussian  noise  assumption  are  unsuccess¬ 
ful.  One  possible  way  to  find  better  estimates  is  to 
model  the  noise  as  an  alpha-stable  process  and  to  use 
the  fractional  lower  order  statistics  of  data  to  esti¬ 
mate  the  signal  parameters.  In  this  work  noise  and 
signal  subspace  methods,  namely  MUSIC  and  Princi¬ 
pal  Component- Bartlett,  are  applied  to  fractional  lower 
order  statistics  of  sinusoids  embedded  in  alpha-stable 
noise.  The  simulation  results  show  that  techniques 
based  on  lower  order  statistics  are  superior  to  their 
second  order  statistics-based  counterparts,  especially 
when  the  noise  exhibits  a  strong  impulsive  attitude. 


1.  Introduction 

Most  of  the  work  on  the  frequency  estimation  prob¬ 
lem  assumes  that  the  additive  noise  has  Gaussian  dis¬ 
tribution.  This  is  partly  because  of  the  nice  properties 
of  the  Gaussian  model  which  allows  for  simplification  of 
the  theoretical  work  and  decreases  the  computational 
complexity  in  signal  parameter  estimation.  As  long  as 
the  noise  distribution  can  fit  approximately  to  a  Gaus¬ 
sian  model,  in  particular  for  the  tails  of  the  distribu¬ 
tion,  one  can  obtain  good  estimators  with  the  Gaussian 
noise  assumption.  But  if  the  noise  process  belongs  to  a 
non- Gaussian,  especially  a  heavily- tailed,  distribution 

*This  work  was  supported  by  TUBITAK  under  contracts 
BEE  AG-83  and  EEEAG-139. 

^On  leave  from  the  Department  of  Electrical  and  Computer 
Engineering,  University  of  Southwestern  Louisiana,  Lafayette, 
LA  70504-3890,  USA. 


class  or  when  the  noise  is  of  impulsive  nature,  param¬ 
eter  estimators  which  are  based  on  Gaussian  noise  as¬ 
sumption  break  down. 

Impulsive  noise  processes  can  be  modeled  using  sta¬ 
ble  distributions.  If  a  signal  can  be  thought  of  as  the 
sum  of  a  large  number  of  independent  and  identically 
distributed  random  variables,  the  limiting  distribution 
will  be  in  the  class  of  stable  distributions  according 
to  Generalized  Central  Limit  Theorem  [5],  and  stable 
distributions  cover  Gaussian  distribution  in  the  limit. 

If  the  additive  noise  has  a  heavily-tailed  distribution 
which  is  successfully  modeled  by  alpha-stable  distribu¬ 
tions,  the  performance  of  covariation-based  frequency 
estimators  is  better  than  that  of  the  traditional  esti¬ 
mators  which  are  based  on  second  order  statistics. 

In  this  work  subspace-based  estimation  methods  us¬ 
ing  covariations  are  considered.  In  Section  2,  the  SaS 
distributions  are  briefly  discussed.  In  Section  3,  the  ap¬ 
plication  of  fractional  lower  order  moments  (FLOM)  to 
frequency  estimation  problem  is  presented.  Section  4 
covers  the  results  of  the  simulation  experiments.  Fi¬ 
nally  conclusions  are  in  Section  5. 

2.  SaS  Distributions 

An  important  sub-class  of  stable  distributions  are 
symmetric  alpha-stable  (5a5)  distributions.  The  char¬ 
acteristic  function  of  SaS  variables  is  given  by: 

4>{(v)  =  exp{j6u)  -  7|w|“}  (1) 

where  a  is  the  characteristic  exponent  (0  <  a  <  2),  6 
is  the  location  parameter  (— oo  <  5  <  oo)  and  7  is  the 
dispersion  (7  >  0).  Without  losing  generality  we  may 
take  the  location  parameter  i  =  0  as  in  the  zero  mean 
Gaussian  noise  assumption  case.  This  assumption  will 
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lead  to  the  characteristic  function: 

I^(w)  =  exp{-7|w|"}  .  (2) 

For  SaS  processes  only  the  moments  of  order  p  < 
a  exist.  So  the  estimation  methods  based  on  second 
order  statistics  of  the  data  cannot  be  applied.  One 
solution  is  to  use  FLOM  of  the  process  [5].  The  so- 
called  covariations  [4]  of  two  random  variables  are  used 
instead  of  second  order  moments  in  the  analysis.  The 
covariation  of  two  jointly  SocS  real  random  variables 
with  dispersions  7®  and  7j,  are  given  as: 

E[XY<r-'>] 

7, 


with 


(3) 


where  jy  =  [Y,  Y]a  is  the  dispersion  of  random  variable 
Y  and  y<P-i>  = 

3-  Frequency  Estimation  Problem 

In  the  frequency  estimation  problem  the  signal 
model  assumed  consists  of  multiple  sinusoids 

K 


C  = 


m 

A(l) 


A(-l) 

A(0) 


A(M-l)  A(M--2) 


A(l-M) 
A(2  -  M) 

A(0) 


ai 

^(1)  1 

02 

A(2) 

a  = 

,  A  — 

; 

.  MM)  . 

In  the  frequency  estimation  of  sinusoids  given  by  the 
Equations  4  and  5  the  sinusoidal  signal  component  can 
be  assumed  to  be  a  stable  AR  process  of  order  2K,  As 
in  the  Gaussian  additive  noise  case,  the  model  order  M 
of  the  AR  model  for  the  noisy  signal  should  be  selected 
higher  than  2K  in  order  to  allow  sufficient  additional 
subspace  dimension  for  the  noise  component.  Assum¬ 
ing  that  the  signal  and  the  noise  components  are  stable 
processes  with  the  same  characteristic  exponent,  their 
covariation  can  be  calculated  as  follows: 


Sn  =  Ajb  sin  {wjbn  +  9k} 

(4) 

k=i 

[xj,Xk]a  =  ,  «Jb  +  ek]a 

(10) 

observed  in  additive  SaS  noise 

“  [^j )  ^k]a  ”1”  j 

Xn  “  Sn  "1"  -2^71  j  ^  1  j  ’  *  ‘  j 

(5) 

J  ®A:]a 

where  Ak  is  the  amplitude,  w*  is  the  angular  frequency, 
and  Bk  is  the  phase  of  the  fcth  real  sinusoid,  K  is  the 
number  of  sinusoids  and  N  is  the  sample  size.  Xn  and 
Zn  are  realizations  of  observation  sequence  Xn  and  SotS 
noise  sequence  Zm  respectively. 

When  the  noise  samples  are  independent  and  identi¬ 
cally  distributed,  the  observation  sequence  can  be  mod¬ 
eled  as  a  stable  AR-process: 


Xn  =  CblXn^l  + 

- h  OAf  A'n-Af  +  boZn- 

(6) 

This  leads  to  the  Generalized  Yule- Walker  Equation 
when  Xn-m  is  given  as  [5]: 

E[Xn\Xr,-m]  = 

-j-OAf  E[Xn—M  [.^n-m]  > 

(7) 

E[X„+i|X„]  = 

Ml)Xn 

(8) 

where  tti  —  If  A(/)  denotes  the  covariation 

coefficient  of  Xn+i  with  Xnj  one  can  find  the  AR- 
parameters  by  solving  the  following  linear  set  of  equa¬ 
tions: 

Ca  =  A  (9) 


where  j,k  =  l,.,,,iV.  Since  the  signal  and  addi¬ 
tive  noise  are  assumed  to  be  independent,  the  cross¬ 
covariation  of  noise  and  signal  components  with  each 
other  is 

=  0  (11) 

[ej.atla  =  0- 

On  the  other  hand  the  covariations  of  the  signal  com¬ 
ponent  and  noise  component  with  themselves  are  found 
as: 

[3j,Sk]a  =  (12) 

[ej>e]fe]„  =  5j,fc7«fc  (13) 

where  Sj^k  Is  the  Kronecker  delta. 

The  covariation  matrix  for  alpha-stable  processes 
has  the  same  meaning  as  that  of  the  covariance  ma¬ 
trix  for  Gaussian  processes.  As  one  performs  eigen- 
decomposition  of  the  covariation  matrix,  the  larger 
eigenvalues  will  correspond  to  signal  subspace  eigen¬ 
vectors  and  the  remaining  eigenvectors  will  constitute 
the  noise  subspace.  So  one  can  perform  eigen-analysis 
on  the  covariation  matrix  and  then  apply  a  suitable 
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Figure  1.  Sample  variance  and  bias  of  PC-Bartlett 
and  ROC-Bartlett  frequency  estimators  versus 
normalized  angular  frequency,  a)  PC-Bartlett,  b) 
ROC-Bartlett  {a  =  1.0,  p  =  0.8  (ROC-Bartlett), 
M  —  20,  GSNR  5  dB,  N  =  50,  100  noise  real¬ 
izations,  20  phase  realizations). 


Figure  2.  Bias  of  MUSIC  and  ROC-MUSIC  fre¬ 
quency  estimators  versus  characteristic  exponent 
of  alpha-stable  noise,  a)  PC-Bartlett  and  MU¬ 
SIC,  b)  ROC-Bartlett  and  ROC-MUSIC  {u  = 
0.76  rad/sec,  M  =  20,  GSNR  =  5  dB,  iV'  =  50, 
100  noise  realizations,  20  phase  realizations). 


ordered  eigenvalues  such  that  Ai  >  A2  >  •  •  •  >  ,  and 

the  corresponding  eigenvectors  of  M  x  M  autocorrela¬ 
tion  matrix.  ROC-Bartlett  is  obtained  by  substituting 
the  covariation  matrix  for  the  autocorrelation  matrix. 

4.  Simulation  Experiments 


noise  subspace  or  a  signal  subspace  technique  to  esti¬ 
mate  the  parameters.  Note  that  the  covariation  matrix 
is  not  symmetric.  This  makes  the  eigen-analysis  more 
complicated  and  renders  many  of  the  subspace- based 
parameter  estimation  techniques  developed  for  Gaus¬ 
sian  processes  unsuitable  for  the  general  alpha-stable 
processes. 

One  such  technique  applied  to  direction  of  arrival 
estimation  problem  is  the  Robust  Covariation-Based 
MUSIC  (ROC-MUSIC)  [6].  In  this  work,  we  first  ap¬ 
ply  ROC-MUSIC  which  is  a  noise  subspace  method 
to  frequency  estimation  in  alpha-stable  environments 
problem  and  then  we  also  apply  Robust  Covariation- 
Based-Bartlett  (ROC-Bartlett)  which  is  a  signal  sub¬ 
space  method,  to  the  problem. 

The  second  order  statistics-based  principal  compo¬ 
nent  Bartlett  frequency  estimate  is  obtained  by  the 
peaks  of  the  spectrum  estimator  [3]: 

PC-Bartlett  (a;)  =  ^  A^  |d^Vi  (14) 

i  =  l 

where  d  is  the  complex  sinusoidal  vector  d  = 
[1  exp  {jo;}  •  •  ■  exp{ju}{M  -  1)}],  and  Ai  and  are 


We  have  used  ROC-MUSIC  and  ROC-Bartlett 
methods  to  estimate  the  frequency  of  a  single  real  sinu¬ 
soid.  The  modified  FLOM  (MFLOM)  estimator  given 
by  [6] 


(15) 

is  defined  for  moment  order  p  G  [0,2]  and  it  is  used 
to  estimate  the  (Aj,  i)th  element  of  the  sample  covari¬ 
ation  matrix  C.  M  denotes  the  order  of  AR-model. 
We  have  applied  SaS  noise  sequences  with  varying  a 
and  7  parameters.  To  generate  the  SaS  noise  pro¬ 
cess  we  used  the  method  described  by  Tsihrintzis  and 
Nikias  [7]  which  is  a  special  case  of  the  more  general 
method  including  the  non-symmetric  alpha-stable  ran¬ 
dom  variable  generation  given  by  Chambers,  Mallows 
and  Stuck  [2].  The  moment  order  p  and  the  sample 
size  N  were  equal  to  0.8  and  50,  respectively.  The  AR- 
model  order  was  chosen  as  20  in  the  simulations.  The 
generalized  SNR,  GSNR  =  101og(:^ELi  k(^)l")  is 
equal  to  5  dB. 


C{k,l)  = 
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GSNR  (c!B) 

Figure  3.  Variance  reduction  of  ROC-Bartiett 
with  respect  to  PC-Bartlett  frequency  estimator 
versus  GSNR  averaged  on  the  frequency  axis,  a) 
a  =  LO,  b)  a  =  1.4,  c)  a  =  1.8,  d:  a  =  2.0 
{M  =  20,  =  50,  100  noise  and  phase  realiza¬ 

tions). 


4.1.  Frequency  Dependence  of  Bias  and 
Variance 

In  Figure  1  the  sample  variance  and  the  bias  of 
PC-Bartlett  and  ROC-Bartlett  frequency  estimators 
are  plotted  against  the  angular  frequency  for  a  =  1.0 
(Cauchy  distribution)  and  GSNR  =  5  dB.  The  number 
of  noise  realizations  and  phcise  realizations  are  100  and 
20,  respectively,  making  a  total  of  2000  Monte  Carlo 
runs.  The  ROC-Bartlett  has  approximately  5  dB  lower 
sample  variance  than  the  PC-Bartlett. 

The  bias  curves  depict  a  symmetry  around  approx¬ 
imately  a;  =  1.7  rad/sec.  The  ROC-Bartlett  performs 
much  better  than  the  PC-Bartlett.  The  difference  of 
their  bias  value  is  more  than  0.4  rad/sec  around  w  = 
0.2  rad/sec. 

4.2.  Dependence  of  Bias  upon  a 

The  bias  behaviour  of  the  estimators  for  iv  =  0.76 
rad/sec  as  a  function  of  the  characteristic  exponent  a 
of  the  noise  is  shown  in  Figure  2.  The  figure  indicates 
that  the  bias  gets  smaller  as  a  increases.  When  a  =  1 
the  bias  values  are  0.45  rad/sec  for  PC-Bartlett  and 
MUSIC  and  it  is  less  than  0.1  rad/sec  for  their  ROC 
versions.  As  this  figure  depicts  for  the  single  tone  case 
as  in  our  experiments,  MUSIC  and  Bartlett  estimators 
show  exactly  the  same  performance. 

4.3.  Dependence  of  Variance  Reduction 
upon  the  GSNR 

In  Figure  3,  the  variance  reduction  achieved  by 
ROC-Bartlett  with  respect  to  PC-Bartlett  is  plotted 


against  GSNR  for  different  values  of  a.  The  num¬ 
ber  of  Monte  Carlo  runs  is  100,  each  with  a  different 
noise  and  phase  realization.  The  curve  exhibiting  the 
highest  gain  belongs  to  a  =  1,0  (Cauchy  noise).  This 
gain  is  approximately  17  dB  when  GSNR  =  20  dB. 
The  curves  show  that  the  variance  increase  introduced 
by  the  ROC-Bartlett  against  PC-Bartlett  is  negligible 
with  the  exception  of  Gaussian  noise  case  where  the 
GSNR  threshold  of  ROC  estimator  is  higher  with  re¬ 
spect  to  that  of  the  second  order  statistics-based  es¬ 
timator.  This  behaviour  validates  the  robustness  of 
FLOM-based  subspace  techniques  and  it  is  also  shared 
by  the  noise  subspace  technique  ROC-MUSIC. 

5.  Conclusion 

When  the  additive  noise  in  the  frequency  estimation 
problem  can  be  modeled  as  an  alpha-stable  process, 
the  FLOM-based  subspace  techniques  perform  better 
than  their  second  order  statistics-based  counterparts. 
Both  ROC-MUSIC  and  ROC-Bartlett  methods  showed 
superior  performance  with  respect  to  MUSIC  and  PC- 
Bartlett  methods  in  our  experiments,  especially  for  low 
a  values. 
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ABSTRACT 

We  address  the  problem  of  coherent  detection  of  a  signal 
embedded  in  heavy-tailed  noise  modeled  as  a  subGaussian, 
alpha-stable  process.  We  assume  that  the  signal  is  a  complex¬ 
valued  vector  of  length  L,  known  only  within  a  multiplicative 
constant.  The  dependence  structure  of  the  noise,  i.e.,  the 
underlying  matrix  of  the  subGaussian process,  is  not  known. 
The  intent  is  to  implement  a  generalized  likelihood  ratio  de¬ 
tector  which  employs  robust  estimates  of  the  unknown  noise 
underlying  matrix  and  the  unknown  signal  strength.  The 
performance  of  the  proposed  adaptive  detector  is  compared 
to  that  of  an  adaptive  matched  filter  that  uses  Gaussian  esti¬ 
mates  of  the  noise  underlying  matrix  and  the  signal  strength 
and  is  found  to  be  clearly  superior.  The  proposed  new  algo¬ 
rithms  are  evaluated  via  Monte- Carlo  simulation. 

Key  words  -  Signal  detection,  subGaussian  process,  adap¬ 
tive  matched  filter 

1.  INTRODUCTION 

The  design  of  modern  signal  processing  systems  includes  the 
design  of  signal  detectors  that  will  operate  in  noise/interference 
that  is  inherently  non  Gaussian  and  rather  follows  some  dis¬ 
tribution  with  tails  that  are  significantly  heavier  than  the 
tails  of  the  Gaussian  distribution.  Such  interference  is  termed 
“impulsive”  and  is  characterized  by  a  significant  probability 
of  its  attaining  high  values.  In  an  impulsive  operational  en¬ 
vironment,  traditional  Gaussian  receivers  will  perform  very 
poorly  and  exhibit  a  number  of  false  alarms  or  misses  that 
is  unacceptably  high.  Thus,  a  need  arises  to  design  re¬ 
ceivers  that  maintain  high  performance  when  operating  in 
the  radar  environment  and  are  robust  to  fluctuations  in 
the  characteristics  of  the  interference.  This  task  can  be 
achieved  only  if  good  statisticcd  models  are  available  to 
quantify  the  interference. 

Classical  statistical-physical  models  for  impulsive  inter¬ 
ference  have  been  proposed  by  Middleton  [2,  4,  3,  5]  and  are 
based  on  the  filtered-impulse  mechanism.  These  models  in¬ 
clude  three  different  classes  of  interference,  namely  A,  B, 
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and  C.  Interference  in  class  A  is  “coherent”  in  narrowband 
receivers,  causing  a  negligible  amount  of  transients.  Inter¬ 
ference  in  class  B,  however,  is  “impulsive,”  consisting  of  a 
large  number  of  overlapping  transients.  Finally,  interference 
in  class  C  is  the  sum  of  the  other  two  interferences.  The 
Middleton  model  has  been  shown  to  describe  real  impulsive 
interferences  with  high  fidelity;  however,  it  is  mathemat¬ 
ically  involved  for  signal  processing  applications.  This  is 
particularly  true  of  the  class  B  model,  which  contains  seven 
parameters,  one  of  which  is  purely  empirical  and  in  no  way 
relates  to  the  underlying  physical  model.  Moreover,  math¬ 
ematical  approximations  need  to  be  used  in  the  derivation 
of  the  Middleton  model,  which  are  equivalent  to  changes  in 
the  assumed  physics  of  the  noise  and  lead  to  ambiguities 
in  the  relation  between  the  mathematical  formulae  and  the 
physical  scenario  [1].  Very  recently,  an  alternative  to  the 
Middleton  model  was  proposed,  which  was  based  on  the 
theory  of  symmetric,  or-stable  (SaS)  distributions  [8,  6]. 

In  particular,  it  was  shown  in  [9,  6]  that,  under  very 
general  assumptions,  the  first  order  distribution  of  impul¬ 
sive  interference  follows  a  SaS  law.  The  stable  model  was 
then  tested  with  a  variety  of  real  data  and  found,  in  all  cases 
examined,  to  match  the  data  with  high  fidelity  [9].  The 
performance  of  optimum  and  sub  optimum  receivers  in  the 
presence  of  SaS  impulsive  interference  was  examined  in  [12], 
both  theoretically  and  via  Monte- Carlo  simulation,  and  a 
method  was  presented  for  the  real  time  implementation  of 
the  optimum  nonlinearities.  From  this  study,  it  was  found 
that  the  corresponding  optimum  receivers  perform  in  the 
presence  of  SaS  impulsive  interference  quite  weU,  while  the 
performance  of  Gaussian  and  other  suboptimum  receivers 
is  unacceptably  low.  It  was  also  shown  that  a  receiver  de¬ 
signed  on  a  Cauchy  assumption  for  the  first  order  distribu¬ 
tion  of  the  impulsive  interference  performed  only  slightly 
below  the  corresponding  optimum  receiver,  provided  that 
a  reasonable  estimate  of  the  noise  dispersion  was  available, 
which  for  real-time  signal  processing  purposes  could  be  ob¬ 
tained  via  the  fast  algorithms  in  [11]. 

The  study  in  [12]  was  later  extended  to  include  the 
optimum  demodulation  algorithm  for  reception  of  signals 
with  random  phcuse  in  impulsive  intereference  [13],  as  well 
in  the  direction  of  asymptotically  optimum,  multichannel 
detection  structures  for  reception  of  amplitude-fluctuating 
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baEdpass  signals  [14].  In  all  cases,  the  key  finding  has  been 
the  same  robustness  result  for  Cauchy-based  algorithms  as 
opposed  to  their  Gaussian  counterparts. 

In  this  paper,  we  look  at  the  problem  of  coherent  de¬ 
tection  of  a  signal  embedded  in  heavy- tailed  noise  modeled 
as  a  subGaussian,  alpha-stable  process.  SubGaussian  pro¬ 
cesses  are  a  special  class  of  multidimensional  alpha-stable 
processes  which  can  efficiently  model  the  presence  of  out¬ 
liers,  as  well  as  a  wide  range  of  dependence  structures  in 
time  series.  We  assume  that  the  signal  is  a  complex- valued 
vector  of  length  L,  known  only  within  a  multiplicative  con¬ 
stant.  The  dependence  structure  of  the  noise,  i.e.,  the  un¬ 
derlying  matrix  of  the  subGaussian  process,  is  not  known. 
The  intent  is  to  implement  an  adaptive  detector  in  which  ro¬ 
bust  estimates  of  the  noise  underlying  matrix  and  the  signal 
strength  are  obtained  from  independent,  multiple  observa¬ 
tions.  The  performance  of  the  proposed  adaptive  detector 
is  compared  to  that  of  an  adaptive  matched  filter  that  em¬ 
ploys  Gaussian  estimates  of  the  noise  covariance  matrix  and 
the  signal  strength  [15].  More  specilically,  the  paper  is  orga¬ 
nized  as  follows:  Section  2  provides  a  brief  review  of  the  ba¬ 
sic  definitions  and  properties  of  suhGaussianSoih  processes. 
In  Section  3,  we  derive  adaptive  algorithms  for  detection  of 
a  (within  a  multiplicative  constant)  known  signal  in  sub¬ 
Gaussian  noise  of  unknown  underlying  matrix.  In  Section 
4,  we  illustrate  the  performance  of  the  proposed  detector  in 
a  computer  simulation  study  in  which  we  also  compare  it 
to  the  adaptive  matched  filter  performance.  We  summarize 
the  paper,  draw  conclusions,  and  suggest  possible  future 
research  topics  in  Section  5. 

2.  SUBGAUSSIAN  SYMMETRIC, 
ALPHA-STABLE  PROCESSES 


where  w  is  a  positive  f -stable  random  variable  [7]  and  G 
is  a  Gaussian  random  vector  of  mean  zero  and  covariance 
matrix  R. 


SubGaussian  SaS  processes  combine  the  capability  to 
model  statistical  dependence  with  the  capability  to  model 
the  presence  of  outliers  in  observed  time  series  of  various 
degrees  of  severity.  The  example  in  Fig.l  is  indicative 
of  the  concept.  Consider  a  subGaussian  vector  of  length 
L  =  100  and  diagonal  underlying  covariance  matrix  ^  = 
diag  {1,1,...,!}.  Typical  realizations  of  the  vector  are 
shown  in  Figs.  1(a)  and  1(b)  for  characteristic  exponents 
a  =  2  and  a  =  1.5,  respectively.  Clearly,  it  is  difficult  to 
distinguish  one  vector  from  the  other  visually.  However, 
if  we  look  over  1000  independent  realizations  of  the  first 
component  of  the  vector,  we  obtain  Figs.  1(c)  and  1(d), 
respectively,  in  which  a  clear  difference  is  observed. 


A  subGaussian  random  vector  X_  can  be  defined  as  a  ran¬ 
dom  vector  with  characteristic  function  of  the  general  form 


<j>{^)  =  exp[-i(w^  i}) 

where  ^is  a  positive-definite  matrix.  Unfortunately,  closed- 
form  expressions  for  the  joint  pdf  of  subGaussian  random 
vectors  are  known  only  for  the  Gaussian  (a  =  2)  and  (..auchy 
(a  =  1)  cases: 


fG{X) 


fc{X) 


-_==L=exp(-X^i2  ^X)  (GaussianX2) 


(  Cauchy )( 3) 


where  L  is  the  length  of  the  random  vector,  \\^\  is  the 
determinant  of  and  c  = 

The  following  proposition  relates  Gaussian  and  subGaus¬ 
sian  random  vectors  and  can,  in  fact,  be  used  to  generate 
subGaussian  random  deviates  [7,  pp.  77-84]: 


Theorem  1  Any  subGaussian  random  vector  is  a  SaS  ran¬ 
dom  vector.  In  addition,  any  subGaussian  random  vector 
can  be  expressed  in  the  form 

X  =  w^G,  (4) 


Figure  1:  Typical  realizations  of  subGaussian  random  vec¬ 
tors 


The  following  proposition  expresses  the  underlying  ma¬ 
trix  of  a  subGaussian  vector  in  terms  of  its  covariation  ma¬ 
trix  and  can,  therefore,  be  used  to  obtain  high  quality  esti¬ 
mates  of  the  underlying  matrix  of  the  vector  from  indepen¬ 
dent  observations  [7,  pp.  89]. 

Theorem  2  Let  X  =  \X\,  A2,  •  •  •  be  a  subGaussian 

random  vector  with  underlying  matrix  Then,  its  covari¬ 
ation  matrix  C_  will  consist  of  the  elements 


Cij  =  [Xi,XjU^2-iRi,R“f  (5) 

Eq.(5)  can  now  be  used  to  compute  an  estimate  of  the 
underlying  matrix  R  from  the  estimate  of  the  covariation 
matrix  C. 


Theorem  3  Let 


K 


fc=l 
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be  the  estimator  of  the  covariation  matrix  elements,  where 
p  <  a/2.  The  estimates 


Rjj  = 


(6) 


Rij  = 


^"CijIRJT 

are  consistent  and  asymptotically  normal  with  means  Rjj 
and  Rij,  respectively,  and  variances  as  in  [10]. 

The  procedure  is  iUustrated  with  the  following  simu¬ 
lation  study:  Consider  a  subGaussian  random  vector  of 
length  L  =  32  and  underlying  matrix  R  =  diag  {1,1,.,.,!}. 
We  assume  that  K  =  500  independ^t  realizations  of  the 
vector  are  available  and  plot  the  16”"^  row  of  the  mean  over 
100  Monte-Carlo  simulations  of  the  following  two  estimates: 


R 


(7) 


fc=i 


^  =  as  obtained  from  covariation  matrix  estimators) 

We  examined  the  cases  of  a  =  2  and  a  =  1.5.  Clearly,  the 
Gaussian  estimate  fails  when  a  =  2,  while  the  covariation- 
based  estimate  maintains  high  performance  in  both  the 
cases  of  a  =  2  and  a  =  1.5. 
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Figure  2:  Illustration  of  the  performance  of  estimators  of 
the  underlying  matrix  of  a  subGaussian  vector 


Next,  we  consider  the  estimation  of  the  amplitude  of 
a  signal  of  known  shape  embedded  in  subGaussian  noise 
from  a  number  of  independent  observations.  The  following 
Proposition  outlines  the  procedure  and  states  its  perfor¬ 
mance. 

Theorem  4  Consider  the  collection  of  K  vectors  X]^  = 
-h  N_^ ,  k  —  1,2,...,K,  where  =  1.  Form  the  least- 
squares  estimates  Ak  =  s'^X]"  =  + 

k  =  l,2,...,K.  Define  A  =  sm  {Ai,A2,. .  .,Ak},  where 
sm{*  •  ♦}  indicates  the  sample  median  of  its  arguments.  The 
estimate  A  is  consistent  and  asymptotically  normal  with 
mean  equal  to  the  true  signal  amplitude  A  and  variance 

Tc{W^]\rvhere-i  =  2-^{s^^)^. 


3.  DATA-ADAPTIVE  ALGORITHMS  FOR 
COHERENT  SIGNAL  DETECTION 


We  consider  the  hypothesis  testing  problem 
Ho  :  X^^K^ 

k  =  \,2,...,K 

Hi  :  2C=5-h^:^ 

where  all  the  vectors  have  dimension  (length)  L  and  k  ~ 
l,2,...yK  indexes  independent,  identically  distributed  re¬ 
alizations. 

We  make  the  following  ^sumptions: 

1.  The  noise  vectors  A*  have  a  sub-Gaussian  distribu¬ 
tion,  i.e., 

where  Wk  is  a  positive  (a/2)-stable  random  variable 
of  unit  dispersion,  G*  is  a  Gaussian  random  vector  of 
covariance  matrix  ^  and  Wk  and  G*  are  independent. 

2.  The  signal  vector  5  =  As  consists  of  a  known  shape 
s  (for  which  s^s  =  1)  and  an  unknown  amplitude  A. 

The  proposed  test  statistic  is  a  generalized  likelihood 
ratio  test  that  makes  use  of  the  multidimensional  Cauchy 
pdf  defined  in  Eq.(2): 


ic 


1+X^/i  X 


l  +  {X-As) 


(9) 


For  the  estimates  ^  and  A,  we  choose  the  estimates 
proposed  in  Eq.(6)  and  Proposition  4,  respectively. 

Assuming  Gaussian  noise  of  unknown  covariance  matrix 
^  and  unknown  signal  amplitude,  the  data- adaptive  detec¬ 
tor  attains  the  form  of  an  adaptive  matched  filter  [15],  i.e., 
it  computes  the  test  statistic 


ta  =  {2IK)  '^(Asfk  'x  -  |A '5,  (10) 

*=1 


Where  i  =  (l/K)  and  A  =  (l/K)  {X- 

M{X-As)'^. 

4.  COMPUTER  ILLUSTRATION 

The  small  sample  performance  of  both  the  Gaussian  and 
the  proposed  Cauchy  detectors  can  be  accurately  assessed 
only  via  Monte- Carlo  simulation.  To  this  end,  we  chose  an 
observation  vector  of  length  L  =  8  and  A"  =  10  independent 
copies  of  it,  while  for  the  signal  we  chose  a  shape  of  a  square 
pulse  of  height  l/y/Z  and  an  amplitude  of  A  =  1.  The  sub¬ 
Gaussian  interference  was  assumed  to  be  of  characteristic 
exponent  a  =  2,1.75,1.5,1.25,1,  and  0.75  and  underlying 
matrix  ^  =  diag  {1,1,...,!}.  The  performance  of  the 
Gaussian  and  the  Cauchy  detectors  was  assessed  via  10,000 
Monte-Carlo  runs. 
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In  Fig.  3,  we  compare  the  performance  of  the  Gaussian 
and  the  Cauchy  detectors  for  different  values  of  the  char¬ 
acteristic  exponent  a.  We  see  that,  for  a  =  2,  the  Gaus¬ 
sian  detector,  as  expected,  outperforms  the  Cauchy  detec¬ 
tor;  however,  for  aQ  other  values  of  a,  the  Cauchy  detector 
maintains  a  high  performance  level,  while  the  performance 
of  the  Gaussian  detector  deteriorates  down  to  unacceptably 
low  levels. 


Figure  3:  Comparison  of  the  small  sample  perforinance  of 
the  Gaussian  (dotted  line)  and  the  Cauchy  (solid  line)  de¬ 
tector. 


5.  SUMMARY,  CONCLUSIONS,  AND  FUTURE 
WORK 

In  this  paper,  we  addressed  the  problem  of  detection  of  a 
signal,  known  within  a  multiplicative  constant,  in  subGaus- 
sian  impulsive  interference  of  unknown  underlying  matrix. 
From  this  study,  we  found  that  the  Gaussian  detectors  for 
the  same  problem  deteriorate  in  performance  when  required 
to  operate  in  subGaussian  interference.  On  the  other  hand, 
a  detector  based  on  the  multidimensional  Cauchy  distribu¬ 
tion  exhibited  resistance  to  the  presence  of  the  subGaussian 
interference  and  high  performance,  comparable  to  the  per¬ 
formance  of  the  Gaussian  detector  in  Gaussian  interference. 
Future  research  in  this  area  seems  to  indicate  the  need  for 
evaluation  of  both  the  proposed  subGaussian  interference 
model  and  the  corresponding  detectors  on  real  data  sets. 
Such  a  process  in  underway  and  its  results  are  expected  to 
be  announced  soon. 
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SUMMARY 

This  paper  emphasizes  the  statistical  properties  of  the  wavelet 
transform  (WT)  and  discusses  some  recent  examples  of 
applications  in  medicine  and  biology. 

The  redundant  forms  of  the  transform  (continuous  wavelet 
transform  (CWT)  and  wavelet  frames)  are  well  suited  for 
detection  tasks  (e.g.,  spikes  in  EEG,  or  microcalcifications  in 
mammograms).  The  CWT,  in  particular,  can  be  interpreted  as  a 
prewhitening  multi-scale  matched  filter.  Redundant  wavelet 
decompositions  are  also  very  useful  for  the  characterization  of 
singularities,  as  well  as  for  the  time-frequency  analysis  of  non¬ 
stationary  signals.  We  briefly  discuss  some  examples  of 
applications  in  phonocardiography,  electrocardiography  (ECG), 
and  electroencephalography  (EEG). 

Wavelet  bases  (WB)  provide  a  similar,  non-redundant 
decomposition  of  a  signal  in  terms  of  the  shifts  and  dilations  of  a 
wavelet  (hierarchical  or  pyramidal  transform).  There  are  also 
non-hierarchical  versions  that  constitute  a  direct  extension  of  the 
traditional  block  transforms  (Fourier,  DCT,  etc..).  This  makes 
WB  well  suited  for  any  of  the  tasks  for  which  block  transforms 
have  been  used  traditionally:  data  compression,  data  analysis 
(decorrelation),  and  data  processing  (generalized  filtering). 
Wavelets,  however,  may  present  certain  advantages  because 
they  can  improve  the  signal-to-noise  ratio,  while  retaining  a 
certain  degree  of  localization  in  the  time  (or  space)  domain.  We 
present  three  illustrative  examples.  The  first  is  a  straightforward 
denoising  technique  that  applies  a  soft  threshold  in  the  wavelet 
domain.  The  second  is  a  more  refined  version  that  uses 
generalized  Wiener  filtering;  it  was  initially  proposed  for 
reducing  noise  in  evoked  response  potentials.  The  third  is  a 
statistical  method  for  detecting  and  locating  patterns  of  brain 
activity  in  functional  images  acquired  using  magnetic  resonance 
imaging  (fMRI). 

Finally,  we  conclude  by  describing  a  wavelet 
generalization  of  the  classical  Karhunen-Loeve  transform.  In 
particular,  we  provide  the  solution  for  the  optimal 
decomposition  of  a  wide  sense  stationary  process 
(unconstrained  case). 


1.  THREE  TYPES  OF  WAVELET  TRANSFORMS 

The  wavelet  transform  is  a  linear  signal  transformation  that  uses 
templates  =a  a),  which  are  shifted  (index 

b)  and  dilated  versions  (index  a)  of  a  given  wavelet  function 
V(x)  [11,  53].  The  wavelet  transform  of  the  signal  /  e  //  is 
parameterized  by  the  scale  and  shift  parameters  a  and  b\  it  is 
typically  written  as 

=  (1) 

where  (•,  )  is  the  inner  product  associated  with  the  Hilbert  space 
H  (/2  or  L2  depending  on  whether  the  signal  /  is  discrete  or 
continuous).  A  basic  requirement  is  that  the  transform  is 
reversible,  that  is,  that  the  signal  f  can  be  reconstructed  from  its 
wavelet  coefficients  T^f{a,b),  The  distinction  between  the 
various  types  of  wavelet  transforms  depends  on  the  way  in 
which  the  scale  and  shift  parameters  are  discretized. 

At  the  most  redundant  end,  one  has  the  continuous  wavelet 
transform  (CWT)  for  which  these  parameters  vary  in  a 
continuous  fashion  [20].  This  representation  offers  the 
maximum  freedom  in  the  choice  of  the  analysis  wavelet.  The 
only  requirement  is  that  the  wavelet  satisfies  an  admissibility 
condition;  in  particular,  it  must  have  zero  mean. 

In  practice,  it  is  often  more  convenient  to  consider  the  WT 
for  some  discretized  values  of  a  and  b  (e.g.,  the  dyadic  scales 
a  =  2'  and  integer  shifts  b  =  k  with  e  Z^).  The  transform 
will  be  reversible  if  and  only  if  the  corresponding  (countable)  set 
of  templates  defines  a  wavelet  frame  (WF)  [10,  19,  1].  In  other 
words,  the  wavelet  must  be  designed  such  that 

V/eH,  A.i|/|f<X|</.V,„„>f^B-||/|f  (2) 

a,h 

where  A  and  B  are  two  positive  constants  (framebounds). 

A  WF  is  just  a  redundant  version  of  a  wavelet  basis  (WB) 
which  can  be  obtained  for  the  critical  sampling  rate:  a  =  2', 
b  =  2'  k  with  {i,k)  €  Z^.  In  this  case,  the  templates  must  also 
be  linearly  independent,  which  imposes  even  stronger 
constraints  on  the  choice  of  \|/.  If  the  framebounds  in  (2)  are 
such  that  A  =  B  =  1,  then  the  transformation  is  orthogonal. 
Such  wavelets  can  be  constructed  by  starting  from  a 
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multiresolution  analysis  of  L2  [26,  27],  The  better  known 
examples  are  the  Daubechies  wavelets  [9],  which  are  orthogonal 
and  compactly  supported;  and  the  Battle-Lemari€  wavelets 
which  are  splines  with  exponential  decay  [24,  27].  In  the  case 
of  semi-  and  bi-orthogonal  wavelet  bases  [8, 49, 2],  one  has  the 
following  signal  representation 

f  keZ 

with  the  short  form  convention  t|/,  t  =  -fc).  The 

analysis  wavelet  V  is  the  dual  of  V  (the  synthesis  wavelet);  in 
the  orthogonal  case,  both  wavelets  are  identical. 

Basic  texbook  references  on  the  wavelet  transform  are  [1 1, 
29,  53].  For  computational  issues,  we  refer  the  reader  to  [46]. 
An  extensive  review  of  its  various  uses  in  medicine  and  biology 
is  given  in  [47];  specific  biomedical  applications  are  also 
described  in  [3]. 

2.  WAVELET  ANALYSIS  AND  FEATURE 
DETECTION 

The  redundant  forms  of  the  transform  (CWT  and  WF)  are 
usually  preferable  for  signal  analyses,  feature  extraction,  and 
detection  tasks  for  they  provides  a  description  that  is  truly  shift- 
invariant.  Next,  we  discuss  some  wavelet  properties  that  are  of 
special  interest  for  this  class  of  applications. 

A.  Wavelets  and  time-frequency  analysis 

An  analysis  wavelet  v  is  typically  a  well  localized 
bandpass  function  with  a  central  frequency  at  (Oo;  a  standard 
requirement  is  that  its  time-frequency  bandwitdth  product  is 
close  to  the  limit  specified  by  the  uncertainty  principle: 
A,,  A,!,  >  1/2.  Thus,  each  analysis  template  tends  to  be 
predominantly  located  in  a  certain  elliptical  region  of  the  time- 
frequency  plane  centered  at  t-b  and  (o  =  CO„/a.  The  area  of 
these  localization  regions  is  the  same  for  all  templates 
((a  A,^)x(A,j,/a))  and  is  constrained  by  the  uncertainty 
principle.  Thus,  by  measuring  the  correlation  between  the 
signal  and  each  wavelet  template,  we  obtain  a  characterization  of 
its  time-frequency  content  (scalogram).  The  main  difference 
with  the  short-time  Fourier  transform  is  that  the  size  of  the 
analysis  window  is  not  constant  for  it  varies  in  inverse 
proportion  to  the  frequency.  This  property  enables  the  wavelet 
transform  to  zoom  in  on  details,  but  at  the  expense  of  a 
corresponding  loss  in  spectral  resolution.  In  this  respect,  we 
should  note  that  most  biomedical  signals  of  interest  include  a 
combination  of  impulse-like  events  (spikes  and  transients)  and 
more  diffuse  oscillations  (murmurs,  EEG  waveforms)  which 
may  all  convey  important  information  for  the  clinician.  The 
short-time  Fourier  transform  or  other  conventional  time- 


frequency  methods  are  well  adapted  for  the  latter  type  of  events 
but  are  much  less  suited  for  the  analysis  of  short  duration 
pulsations.  When  both  types  of  events  are  present  in  the  data, 
the  wavelet  transform  can  offer  a  better  compromise  in  terms  of 
localization.  This  may  explain  its  recent  success  in  biomedical 
signal  processing.  Recent  examples  of  applications  where  time- 
frequency  wavelet  analysis  appears  to  be  particularly  appropriate 
are  the  characterization  of  heart  beat  sounds  [22,  21,  31],  the 
analysis  of  ECG  signals  including  the  detection  of  late 
ventricular  potentials  [21,  16,  28,  39]  ,  the  analysis  of  EEGs 
[38,  37,  50],  as  well  as  a  variety  of  other  physiological  signals 
[36]. 

B.  Wavelets  as  a  multi-scale  matched  filter 

In  essence,  the  continuous  wavelet  transform  performs  a 
correlation  analysis,  so  that  we  can  expect  its  output  to  be 
maximum  when  the  input  signal  most  resembles  the  analysis 
template  Vu.*,)-  Consider  the  measurement  model 
f(x)  =  <l>„(x-Ax)  +  n(x)  where  (p„(x)  =  <p(x/a)  is  a  known 
deterministic  signal  at  scale  a.  Ax  an  unknown  location 
parameter,  and  n(x)  an  additive  white  Gaussian  noise 
component.  Classical  detection  theory  tells  us  that  the  optimal 
procedure  for  estimating  Ax  is  to  perform  the  correlation  with  all 
possible  shifts  of  our  reference  template  and  to  select  the 
position  that  corresponds  to  the  maximum  output  (matched 
filter).  Therefore,  it  makes  sense  to  use  a  wavelet  transform-like 
detector  whenever  the  pattern  <p  that  we  are  looking  for  can 
appear  at  various  scales. 

If  the  noise  is  correlated  instead  of  white,  then  we  can  get 
back  to  the  previous  case  by  applying  a  whitening  filter. 
Interestingly,  the  wavelet-like  structure  of  the  detector  is 
preserved  exactly  if  the  noise  has  a  fractional  brownian  motion 
structure.  Specifically,  when  the  noise  average  spectrum  has  the 
form  <|>„(C£))  =  a^/H“  with  a=2H+l  where  H  is  the  Hurst 
exponent,  we  can  show  that  the  optimum  detector  is 

proportional  to  the  ath  fractional  derivative  of  the  pattern  <p  that 
we  want  to  detect.  Consequently,  for  //>0,  the  optimal  detector 
is  an  admissible  wavelet  even  if  the  initial  template  (p(x)  is  not 
(e.g.  it  is  a  lowpass  function).  For  example,  the  optimal 
detector  for  finding  a  Gaussian  in  0{(0'^)  noise  is  the  Mexican 
hat  wavelet  (2nd  derivative  of  a  Gaussian).  As  suggested  by 
Strickland,  this  is  perhaps  one  of  the  main  reasons  why  the 
wavelet  transform  works  well  for  detecting  microcalcifications 
in  mammograms  [7,  32, 41]. 

3.  WAVELET  BASES 

Wavelet  bases  provide  a  non-redundant  decomposition  of  a 
signal  in  terms  of  the  shifts  and  dilations  of  V  (hierarchical  or 
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pyramidal  transform).  Hence,  it  is  possible  to  represent  a  signal 
through  its  wavelet  expansion 

/  =  (4) 

I  *€2 

where  the  are  the  wavelet  coefficients  (scale 

index  /,  and  position  index  k).  There  are  also  non-hierarchical 
versions  (wavelet  packets,  M-band  perfect  reconstruction 
filterbanks)  that  constitute  a  direct  extension  of  the  traditional 
block  transforms  (Fourier,  DCT,  etc.,).  The  important  point  for 
our  purpose  is  that,  in  the  discrete  case,  the  decomposition 
formula  (4)  provides  a  one-to-one  representation  of  the  signal  in 
terms  of  its  wavelet  coefficients  (reversible  linear 
transformation).  This  makes  WB  well  suited  for  any  of  the 
tasks  for  which  block  transforms  have  been  used  traditionally: 
data  compression,  data  analysis  (decorrelation),  and  data 
processing  (generalized  filtering).  Wavelets,  however,  may 
present  certain  advantages  because  they  can  improve  the  signal- 
to-noise  ratio,  while  retaining  a  certain  degree  of  localization  in 
the  time  (or  space)  domain. 

A.  Data  Compression 

Data  compression  can  be  achieved  by  quantization  in  the 
wavelet  domain,  or  by  simply 

discarding  certain  coefficients  that  are  insignificant.  This  form  of 
orthogonal  (or  close-to-orthogonal)  decomposition  has  been 
used  effectively  for  image  compression  [25,  4,  14,  40]. 
Traditionally,  this  has  been  one  of  the  primary  applications  of 
wavelets. 


B.  Data  Processing:  wavelet  denoising 

One  of  the  first  application  of  the  wavelet  transform  in 
medical  imaging  was  for  noise  reduction  in  MR  images  [54]. 
The  approach  proposed  by  Weaver  et  al.  was  to  compute  an 
orthogonal  wavelet  decomposition  of  the  image  and  apply  the 
following  soft  thresholding  rule  on  the  coefficients 


0  (5) 

where  t.  is  a  threshold  that  depends  on  the  noise  level  at  the  ith 
scale;  the  image  is  then  reconstructed  by  the  inverse  wavelet 
transform  of  the  c,  * 's.  This  is  essentially  the  wavelet  shrinkage 
denoising  method  later  systematized  by  Donoho  and  Johnston 
[18,  17],  as  well  as  DeVore  and  Lucier  [15],  This  algorithm  is 
extremely  simple  to  implement  and  works  well  for  moderate 
levels  of  noise.  Asymptotically  (as  the  scale  goes  to  zero  and  as 
the  noise  energy  gets  distributed  over  more  and  more  sample 


values),  it  has  some  interesting  min-max  optimality  properties 
for  a  relatively  large  class  of  signals  [17]. 

The  approach  can  easily  be  taken  one  step  further  by 
considering  more  general  pointwise  non-linear  transformations 
Kk  -  Consider  the  measurement  model  c.  *  =  +/i.^ 

where  denotes  the  wavelet  coefficient  of  the  noise-free 
signal  and  is  an  independent  noise  component.  In  principle 
at  least,  one  could  apply  the  optimal  Bayesian  estimation  rule  : 
Kk  =  ^^Ik  *  which  minimizes  the  mean  square  error.  This 
of  course  requires  the  knowledge  of  the  a  posteriori  probability 
density  function  /7(c^lc),  which  depends  on  our  a  priori 
knowledge  on  and  on  the  noise  distribution 

(p{n)  =  p(c  I  cO).  We  can  also  constrain  ourselves  to  the  class 
of  linear  estimators,  and  derive  the  optimal  linear  estimate 


mcuf]  ] 


(6) 


which  has  the  form  of  a  generalized  Wiener  filter.  This  particular 
algorithm  was  first  proposed  by  Bertrand  et  al.  for  the 
processing  of  evoked  response  potentials  (ERPs)  [5].  These  are 
very  noisy  signals  with  a  strong  deterministic  component. 
Because  ERPs  are  usually  acquired  using  multiple  trials  (typ. 
100-600  repetitions),  the  optimal  weighting  factors  in  (6)  can  be 
estimated  on  a  coefficient-by-coefficient  basis  in  an  initial 
training  phase,  or  even  updated  recursively.  In  this  particular 
application,  the  wavelet  transform  appears  to  be  superior  to  the 
Fourier  transform,  the  latter  being  optimal  only  when  both  the 
signal  and  noise  are  stationary  (conventional  Wiener  filter). 

C.  Data  Analysis:  detecting  changes  in  fMRI 

Functional  neuroimaging  is  a  fast  developing  area  aimed  at 
investigating  the  neuronal  activity  of  the  brain  in  vivo.  The  data 
for  those  studies  is  provided  by  positron  emission  tomography 
(PET),  and  functional  magnetic  resonance  imaging  (fMRI). 
PET  measures  the  spatial  distribution  of  certain  function-specific 
radiotracers  injected  into  the  bloodstream  prior  to  imaging.  A 
typical  example  is  the  measurement  of  cerebral  glucose 
utilization  with  the  tracer  [^^F]2-fluoro-2-deoxy-D-glucose 
(FDG).  fMRI,  which  is  a  more  recent  technique,  allows  for  a 
visualization  of  local  changes  in  blood  oxygenation  believed  to 
be  induced  by  neuronal  activation.  It  is  substantially  faster  than 
PET  and  also  offers  better  spatial  resolution.  Yet,  there  is  still 
disagreement  among  specialists  concerning  the  exact  nature  of 
the  biological  processes  that  produce  the  observed  changes  in 
the  MR  signal. 

The  functional  images  obtained  with  those  two  modalities 
are  extremely  noisy  and  variable,  and  their  interpretation 
requires  the  use  of  statistical  analysis  methods  [51].  What  is 
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typically  of  interest  is  the  detection  of  the  differences  of  activity 
between  different  groups  of  subjects  (e.g.  normal  versus 
diseased)  or  between  different  experimental  conditions  with  the 
same  subject  (e.g.  rest  versus  word  generation).  In  either  case, 
the  variability  of  the  signal  is  such  that  multiple  subjects  or 
repeated  trials  are  required  in  each  subgroup. 

The  first  step  in  this  analysis  is  to  register  the  various 
images  so  that  they  can  be  compared  on  a  pixel-by-pixel  basis 
[42].  The  second  step  is  to  compute  the  difference  between  the 
aligned  group  averages  and  perform  the  statistical  analysis. 
Testing  in  the  image  domain  directly  is  difficult  because  of  the 
amount  of  residual  noise  and  the  necessity  to  use  a  very 
conservative  significance  level  to  compensate  for  multiple  testing 
(one  test  per  pixel!).  A  better  solution  is  to  perform  the  testing 
in  the  wavelet  domain  [35,  33,  51].  The  main  advantage  is  that 
the  discriminative  information,  which  is  smooth  and  well 
localized  spatially,  becomes  concentrated  into  a  relatively  small 
number  of  coefficients,  while  the  noise  remains  evenly  divided 
among  all  coefficients.  In  addition,  the  number  of  statistical 
tests  can  be  reduced  considerably  by  first  identifying  the  few 
wavelet  channels  that  contain  significant  differences.  A  recent 
application  of  this  technique  to  fMRI  is  presented  in  [34]. 

4.  EXTENSION  OF  THE  KARHUNEN-LOEVE 
TRANSFORM 

One  stage  of  the  fast  wavelet  transform  algorithm  can  be 
conveniently  described  as  a  multivariate  filtering  operation  using 
the  so-called  polyphase  representation  [53].  The  corresponding 
filterbank  system  is  shown  in  Fig.  1 . 
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Fig.  1 :  Polyphase  representation  of  a  P-band  wavelet  analysis  filterbank. 


In  this  diagram,  x(k)  represents  the  input  signal  and  the  y's  are 
the  various  wavelet  channels  after  one  level  of  decomposition. 
In  the  standard  dyadic  case,  there  are  only  two  channels  {P-2), 
but  the  concept  is  also  valid  for  larger  values  of  P  (P-band 
perfect  reconstruction  filterbank)  [52,  53].  It  turns  out  that  the 
transformation  is  orthogonal  if  and  only  if  the  PxP  transfer 
function  matrix  H{z)  satisfies  the  paraunitary  condition: 

H{z)H{\lz)  =  Ip.  (7) 


where  Ip  is  the  PxP  identity  matrix.  Note  that  for  traditional 
block  transforms,  the  matrix  H{z)  does  not  depend  on  z  (i.e., 
the  various  blocks  are  processed  independently  of  each  other). 
In  order  to  design  the  optimal  wavelet  transform  for  a  given 
class  of  input  signals,  it  is  therefore  natural  to  seek  the 
paraunitary  matrix  H{z)  that  provides  the  maximum  energy 
compaction  in  the  wavelet  domain  [44].  If  the  matrix  H  is 
constrained  to  be  real  (no  delays),  the  solution  corresponds  to 
the  classical  Karhunen-Loeve  transform  (KLT).  If  we  allow  for 
more  general  structures  (for  example,  H{z)  is  an  A'-point  FIR 
transfer  function),  we  can  get  better  results  but  the  filter 
optimization  subject  to  constraint  (7)  is  a  rather  difficult  task 
[44,  30, 6,  13].  One  interesting  property  of  the  optimal  solution 
is  that  the  transformed  components  are  uncorrelated;  however, 
this  is  not  a  sufficient  condition  for  optimality,  in  contrast  with 
the  standard  KLT  [44]. 

If  we  do  not  impose  any  order  constraint  on  H{z)y  it  is 
possible  to  derive  the  optimal  solution  analytically  for  any  given 
wide  sense  stationary  process  with  spectral  power  density 
5j(a)).  The  two  channel  case  is  considered  in  [45];  the  more 
general  P-band  case  is  treated  in  [43]  using  an  elegant  principal 
component  formulation  in  the  frequency  domain.  In  each  case, 
the  solution  depends  on  the  spectral  characteristics  of  the  input 
signal  and  has  the  form  of  an  ideal  filter  with  pure  "on"  and 
"off  frequency  bands.  If  the  power  spectral  density  is  non¬ 
increasing,  then  the  optimal  solution  is  the  ideal  filterbank  with 
P  uniformly- spaced  subbands.  Interestingly,  there  are  a  number 
of  wavelet  transform  constructions  that  converge  asymptotically 
to  this  limit.  The  better  known  example  is  the  family  of  Battle- 
Lemarie  spline  wavelets  which  converge  to  an  ideal  bandpass 
filter  as  the  order  of  the  spline  goes  to  infinity  [24,  48,  2]. 
Daubechies  wavelets  also  exhibit  similar  convergence  properties 
[23].  This  partially  explains  why  higher  order  wavelets  usually 
result  in  smaller  approximation  errors. 

These  unconstrained  solutions  are  primarily  of  interest 
from  a  theoretical  point  of  view.  For  example,  they  can  be  very 
useful  for  deriving  asymptotic  bounds  on  the  best  performance 
achievable  (e.g.  coding  gain  over  PCM)  [12].  They  are  less 
relevant  for  implementation  purposes  because  of  the 
disadvantages  of  ideal  filterbanks  (slowly  decaying  impulse 
responses,  Gibbs  oscillations).  This  provides  a  good  motivation 
for  investigating  more  constrained  solutions.  As  far  as  we 
know,  there  is  not  yet  any  general  procedure  for  designing 
optimal  FIR  wavelets  that  is  entirely  satisfactory;  this  is 
currently  an  active  area  of  research. 
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Abstract 

The  problem  of  estimating  the  speed  of  a  particle  of  air  pass¬ 
ing  through  the  region  of  interference  fringes  generated  by 
two  coherent  laser  beams  is  addressed.  The  signal  detected 
being  of  the  form  i4.exp{— cos{27t  fdt)  -where  the 
Doppler  frequency  fd  is  related  to  particle 's  velocity-  this 
paper  is  concerned  with  the  best  accurate  estimation  of  the 
parameters  A  and  fd  in  the  model  considered.  Cramer- 
Rao  bounds  on  the  accuracy  of  estimates  of  A  and  fd  are 
derived  and  closed-formed  expressions  are  given.  Approx¬ 
imated  formulas  provide  quantitative  insights  into  the  in¬ 
fluence  of  a  and  fd.  Additionally,  a  Maximum  Likelihood 
Estimator  is  presented.  Numerical  examples  illustrate  the 
performance  of  the  MLE  and  compare  it  with  the  CRB.  The 
influence  of  the  SNR,  the  sample  size,  the  optical  parame¬ 
ter  a  and  the  frequency  fd  on  the  estimation  performance 
are  emphasized. 

1.  Problem  statement 

Laser  velocimeters  have  gained  popularity  in  the  fluid 
mechanics  application,  where  they  have  been  used  to 
estimate  particles  velocity  in  a  flow[l][2],  mainly  for 
measurements  in  wind  tunnels.  Since  this  system  pro¬ 
vides  a  non-intrusive  and  reliable  way  of  measuring 
local  velocities  in  fluid  flows,  it  has  become  an  inter¬ 
esting  alternative  to  mechanical  systems,  for  instance 
in  situations  where  one  does  not  want  to  disturb  the 
flow.  Furthermore,  these  systems  have  been  reported 
to  yield  precise  estimation.  Examples  of  laser  velocime- 
try  applications  include  analysis  of  flow  surrounding 
the  blade  tips  of  a  hovering  rotor,  measurements  of 
mean  velocity  and  turbulence  intensity  in  unsteady  ul¬ 
trasonic  flow.  In  aeronautics  applications,  there  is  a 
vital  need  in  having  a  reliable  aircraft’s  speed  mea¬ 
surement  system.  Moreover,  this  system  must  fulfil  se¬ 
vere  constraints  regarding  size,  weight,  accuracy  and 
robustness.  With  the  emergence  of  a  new  generation 
of  cheap  and  small  laser  diodes,  laser  anemometers  be¬ 
come  a  conceivable  and  promising  technique  for  on¬ 
board  measurement  of  aircraft’s  speed.  The  principle 
of  such  a  system  is  now  briefly  described.  Two  coher¬ 


ent  laser  beams  are  crossed  and  focused  in  the  vicinity 
of  the  aircraft.  They  generate  a  symmetric  ellipsoidal 
probe  volume  composed  of  equidistant  bright  and  dark 
fringes.  As  a  particle  of  air  passes  through  this  region, 
the  fringes  will  cause  it  alternatively  to  scatter  and  not 
to  scatter  light,  according  to  the  particle’s  velocity  and 
inter  fringe  width.  More  exactly,  the  signal  received  by 
the  photodetector  can  be  shown[l]  to  have  the  form 

a:(i)  =  ^^cos(27ry^)  H-^t) 

=  +  t  =  0, ±1,...,±T  (1) 

where  V  represents  the  particle’s  velocity,  2W  is  the 
total  length  of  the  interference  fringes,  I  denotes  the  in¬ 
terfringe  width.  The  amplitude  A  depends  on  the  par¬ 
ticle’s  size,  the  power  emitted  and  optical  transmission 
coefficients.  In  (1)  the  additive  noise  {w{t)}  is  assumed 
to  be  a  sequence  of  i.i.d.  Gaussian  variables  with  vari¬ 
ance  It  should  be  pointed  out  that  the  Gaussian 
shape  of  the  time-varying  amplitude  e  v  w  ;  is  di¬ 
rectly  induced  from  the  Gaussian  shape  of  the  inten¬ 
sity  distribution  within  the  laser  beam.  In  what  fol¬ 
lows,  we  note  a  and  let  fd  y  denote  the 

’’Doppler”  frequency^  so  that  s{t)  in  (1)  can  be  rewrit¬ 
ten  as  s{t)  =  .cos(27r/rft).  Here,  we  are  con¬ 

cerned  with  the  best  accuracy  that  can  be  achieved 
when  estimating  the  parameters  A  and  /d  in  (1).  It 
should  be  noted  that  the  model  studied  here  belongs 
to  the  class  of  amplitude  modulated  sinusoidal  signals 
(see  [3]  for  a  thorough  overview  of  multiplicative  mod¬ 
els).  However,  in  contrast  with  most  approaches,  the 
time- varying  amplitude  cannot  be  viewed  just  as  a  per¬ 
turbation  term.  Moreover,  the  amplitude  and  phase 
are  not  decoupled  from  one  another,  as  they  both  carry 
information  about  the  frequency  of  interest. 

2.  Cramer- Rao  Bounds 

Let  6  =  [A,  fdiC^w]  piarameter  vector  to  be  es¬ 

timated  from  the  measurements 
later  use,  we  define 
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s  =  (s(— r),...,s(0),...,5(r)]^  and 
w  =  [w{—T),...,w{0),...,w(T)]^  so  that  (1)  can  be 
written  in  the  following  compact  form 

X  =  i4s  +  w  (2) 


2.1.  Exact  CRB 

Under  the  white  Gaussian  assumption  for  w{t),  the 
log-likelihood  function  is  given  by[4] 

A(pc,e)  =  cte-^^\nal-^\\x-Asf  (3) 

Twice  differentiating  (3)  wrt  6  and  taking  expecta¬ 
tions,  it  is  straightforward  to  show  (see  [5]  for  details) 
that  the  Fisher  Information  Matrix  (FIM)  is  given  by 


i.s^s 


4-  s^s' 
®/ 

A*  tn^  t 


^.3-^s'f  ^.Sfs)  0 
0  0  ^ 


where  Inverting  (4),  the  diagonal  terms  of 

the  Cramer-l^o  Bound  (CRB)  are  obtained  as 


it  will  consist  of  a  combination  of  terms  of  the  forms 
Er=-Texp{-4aVd^®}  ^  (47r/d«).  To  get 

further  insights  into  their  values,  we  propose  to  use  the 
following  approximation: 

lim  {4nfdt) 

X~*oo  ^  ^  }  sin  ^  ' 

t==-T 

/oo 

exp  r.  {4nfdt).dt  (8) 

•OO 

This  corresponds  to  a  rectangular  approximation  of 
the  integral.  More  intuitively,  consider  either  a  signal 
x(t)  =  or  a  random  process  with  auto¬ 
correlation  r(r)  =  .r”.  The  left-hand  (resp. 

right-hand)  sides  of  (8)  are  the  real  and  imaginary 
parts  of  the  Discrete  Time  (resp.  Continuous  Time) 
Fourier  TVansform  of  these  sequences,  evaluated  at  2/rf. 
Hence,  (8)  amounts  to  say  that  the  DTFT  fairly  ap¬ 
proximates  the  CTFT,  which  is  a  common  hypothesis. 
However,  it  is  only  an  approximation  and  the  ensuing 
expressions  are  not  exact.  Nevertheless,  as  will  be  il¬ 
lustrated  by  numerical  examples,  it  is  a  very  accurate 
approximation.  Based  on  (8),  it  can  be  shown [5]  that 


CRB{al) 


2T+1 


CRB{A)=al 


’“•(s-s)(sfs')-(s-s^)(srs') 

CRBifd)  =  ^. - 7 - - r-7 - ^  (7) 

(s^s)(sfs')-(s^s')  (s^s^) 

which  provides  closed- form  expressions  of  the  CRB. 
The  influences  of  a  and  are  of  interest  as  they  can 
guide  the  selection  of  the  sampling  frequency  and  the 
optical  parameters.  However,  it  turns  out  that  an  an¬ 
alytical  study  from  (7)  of  the  dependence  of  CRB{f(i) 
on  a  and  fd  is  intractable,  the  derivatives 

being  difficult  to  interpret.  This  influence 
will  therefore  be  evaluated  numerically.  However,  fur¬ 
ther  insights  into  the  analysis  of  the  CRB  can  be  gained 
by  considering  the  large-sample  case  and  approximated 
formulas  for  the  FIM,  as  shown  in  the  next  section. 

2.2.  Approximated  CRB 

The  aim  of  this  section  is  to  get  simplified  expressions 
for  the  CRB  which  could  provide  direct  relations  be¬ 
tween  CRB{fd^  and  the  parameters  a^fd,A^a^.  We 
consider  that  T  is  ’’large”  (as  exp  {— decays 
very  quickly,  this  assumption  is  not  restrictive).  First, 
observe  that  the  asymptotic  FIM  depends  on  the  quan¬ 
tities  lim  s^s,  lim  and  lim  sTsf.  Therefore, 


(5) 

lim  s^s 
T-*>oo 

[-5}] 

(9) 

(6) 

lim  s^S/ 
r-^oo  ^ 

1 

+ 

-5}] 

(10) 

(7) 

lim  sTsf  cr: 
r-^oo  ^  ^ 

V^(3a2-|-27r2) 

^  ^/j^  (3a^  —  2n^) 


Furthermore,  we  note  that,  in  general,  a  (the  inverse 

of  the  number  of  interference  fringes)  is  small  which 

2 

implies  that  terms  of  the  form  /J.exp 

7. exp  22^1  are  negligible  compared  to  1.  By  ne¬ 
glecting  these  terms  in  (9)-(ll)  the  (approximated) 
asymptotic  FIM  corresponding  to  [A,fd]^  is  given  by 

r  1 

F[x,/dl  -  ;;2’  I  -A,/^  I  (12) 


—  Ay/n 


With  this  simplihcation,  inverting  (12)  and  rearrang- 
ing  terms,  it  comes: 


CRB{A)c^al. 


2  2.  (3a=^4-27r^)  .a.fd 


CRBifd) 


y/ir.  (a^  -h  TT^) 
y/n.  (a^  +  TT^) 
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The  approximated  formulas  (13)  and  (14)  are  believed 
to  be  of  interest  from  both  a  theoretical  and  practical 
point  of  view.  It  will  be  shown  in  the  section  4  that  they 
provide  very  accurate  approximations  of  (6)  and  (7), 
These  formulas  give  a  direct  expression  of  the  CRB  as 
a  function  of  a  (which  is  a  design  parameter)  and  fd. 
It  should  be  pointed  out  first  that  the  CRB  tends  to 
a  constant  as  the  sample  size  goes  to  infinity.  This  is 
in  contrast  with  most  estimation  problems  where  the 
Cramer-Rao  bound  is  usually  of  order  0{l/T).  There¬ 
fore,  for  T  above  a  threshold,  it  can  be  expected  that 
the  CRB  will  not  decrease  (hence  no  improvement  is 
achieved):  this  will  be  illustrated  in  section  4.  Note 
also  that  CRB{fd)  is  roughly  proportional  to  a^./| 
whereas  CRB  {A)  is  proportional  to  a./d.  Therefore, 
increasing  the  probe  volume  by  a  factor  of  10  could 
possibly  result  in  a  gain  of  1000  on  the  variance  of  the 
frequency  estimate.  Additionally,  observe  that 

dCRBjfa)  g".  (g"  +  Stt^) 

da  y/Tr,A^  (a^  +  tt^)^ 


Therefore,  only  Ji(/)  =  needs  to  be  computed. 

The  derivative  e'  =  can  be  written  as 

e'  =  •J^^{[(x'^s')(s^s)-2(x^s)(s'^s')]s 

4-  (x'’"s)  (s'^s)s'}  (20) 

Hence 


m  = 


2€'^€ 

27^^-  {(x^s)  (sV)  -  (x^s')(s^s)} 
(s^s) 

(21) 


The  Gauss-Newton  makes  use  of  the  following  approx¬ 
imation  for  the  Hessian 

J('(/)  «  2€'^€'  (22) 

The  frequency  is  thus  estimated  in  an  iterative  way 

/(”+!)  =  /(”)  -  [[€'^6']"^ 


dCRBUd)  _  ol  24.a^./g 

dfd  (a^  -f  TT^) 


(16) 


which  implies  that  CRB{fd)  monotonically  increases 
with  a  and  /d. 


3.  Maximum  Likelihood  Estimation 

In  this  section,  we  derive  maximum  likelihood  estima¬ 
tors  of  A  and  fd  in  the  model  (1).  For  any  given  value 
of  /d,  A  (x,^)  in  (3)  being  a  quadratic  function  of  the 
parameter  A,  the  minimization  w.r.t.  A  reduces  to  a 
simple  least-squares  problem  and  leads  to 

(1-7) 

s 

A  will  be  the  MLE  of  A  if  /d  is  replaced  by  its  ML  esti¬ 
mate  in  (17).  Reporting  (17)  into  (3),  the  ML  estimate 
of  fd  is  then  found  to  be  the  solution  of  the  following 
minimization  problem: 


fd 

Jiif) 


arg  min  Ji  (/) 

T  I|2 

X 

X - nr-.s 

s^  s 


=^\\ef 


(18) 

(19) 


Since  Ji{f)  is  a  non-linear  function  of  /,  no  analytical 
solution  for  the  problem  exists  and  one  has  to  resort 
to  numerical  methods  [6].  In  an  attempt  to  provide  a 
computationally  efficient  algorithm,  we  propose  to  use 
a  Gauss-Newton  procedure  which  uses  exact  first-order 
derivatives  and  approximated  second-order  derivatives. 


The  iterations  are  stopped  whenever  j  < 

S  where  5  is  a  user  defined  parameter.  In  order 
to  avoid  possible  convergence  towards  a  local  minima, 
care  is  to  be  taken  in  order  to  properly  initialize  the 
algorithm.  In  the  simulations  presented  in  the  next 
section,  a  Fast  Fourier  Transform  of  the  data  followed 
by  a  coarse  search  for  the  maximum  is  used. 

4.  Numerical  examples  and  conclusions 

In  this  section,  we  present  some  numerical  examples  in 
which  we  compare  the  CRB  derived  in  Section  2  with 
the  performance  of  the  MLE.  Since  the  FFT-based  es¬ 
timate  is  also  available  as  the  initial  step  of  the  MLE 
and  because  it  is  the  most  intuitive  way  for  spectral 
estimation,  we  will  also  compare  its  performance  with 
the  CRB.  Additionally,  we  provide  a  comparison  be¬ 
tween  exact  and  approximated  CRB  and  we  illustrate 
the  influence  of  various  parameters  on  the  estimation 
performance.  We  concentrate  on  the  estimation  of  the 
frequency  fd  which  directly  provides  particle’s  velocity. 
The  value  of  a  is  selected  as  a  =  0.122857  and  A  =  1 
thorough  the  simulations.  The  Signal  to  Noise  Ratio 
(SNR)  is  defined  as  SNR  =  First,  we  study  the 
influence  of  the  number  of  sampFes  T.  Figures  1  and  2 
show  the  MSE  of  the  estimates  versus  T  for  different 
values  of  fd  and  with  SNR  =  16dB.  Exact  CRB  (given 
by  (7))  are  shown  in  solid  lines  whereas  approximated 
CRB  (see  (14))  appear  in  dashed-dotted  lines.  FVom 
these  figures,  it  can  be  seen  that  the  MLE  has  a  per¬ 
formance  very  close  to  the  CRB  and  superior  to  the 
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FFT  estimator.  The  approximate  formula  (14)  gives 
a  very  accurate  approximation  of  the  exact  CRB,  as 
long  as  T  is  large  enough  (which  is  an  expected  result 
since  the  approximated  formula  is  ’’asymptotic”  in  T). 
However,  the  number  of  samples  needed  for  the  two 
expressions  to  be  equal  is  reasonable  (this  number  de¬ 
creases  while  fd  increases).  Note  also  that,  when  T 
increases  above  a  threshold  (typically  T  >  l/o^fd)y  no 
improvement  is  achieved.  This  is  due  to  the  fact  that, 
for  large  t,  «  0:  hence,  the  signal  essentially 

contains  noise. 


MEAN<SQUARE  ERROR  OF  FREQUENCY  ESTIMATE 


Fig.l.  CRB  and  performance  of  MLE  and  FFT  esti¬ 
mators  versus  T.  SNR  =  15dj5.  fd  =  0.05.  i4  =  1. 

MEAN4IQUARE  ERROR  OF  FREQUENCY  ESTIMATE 


NUMBER  OF  SAMFLS 


Fig.  2.  CRB  and  performance  of  MLE  and  FFT  esti¬ 
mators  versus  T.  SNR  =  15dj3.  fd  =  0.15.  >1  =  1. 

We  now  investigate  the  dependence  of  CRB{fd)  on 
fd  and  a  in  Figures  3,4.  As  can  be  seen,  the  CRB  in¬ 
creases  with  fd  or  a,  which  was  expected  from  (15), (16). 
As  fd  (or  a)  increases,  the  bandwidth  of  the  time- 
varying  amplitude  exp  {— increases,  which  in 
turns  complicates  the  frequency  estimation.  Finally, 
note  that  CRB{fd,ct  >  0)  >  CRB{fd^a  =  0),  this 
latter  case  corresp)onding  to  the  constant  amplitude  si¬ 
nusoidal  signal.  Therefore,  although  information  about 
fd  is  contained  in  both  the  amplitude  and  the  phase  of 
s(t),  this  does  not  improve  the  estimation  compared 


with  the  constant  amplitude  case  where  the  ampli¬ 
tude  does  not  bring  information  about  the  Doppler 
frequency. 


MEAN-SQUARE  ERROR  OP  FREQUENCY  BSTIMATB 


Fig.  3.  CRB  and  performance  of  MLE  and  FFT  esti¬ 
mators  versus  fd^  T  =  300.  SNR  =  l5dB.  >1  =  1. 

MEAN-SQUARE  ERROR  OT  FREQUENCY  ESTIMATE 


ALPHA 

. —  ***"  HftpraxatB  ^  MLE  •  ” 

Fig.  4.  CRB  and  performance  of  MLE  and  FFT  esti¬ 
mators  versus  a.  fd  =  0.15.  T  =  350.  SNR  =  15dB. 
A=l. 
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ABSTRACT 

We  present  a  sequential  test  and  parameter  estimation 
technique  for  measured  seismic  data  from  the  GERESS 
array  situated  in  Germany.  A  new  approximation  of 
the  test  statistic  distribution  and  the  test  threshold  is 
proposed.  The  sequentially  rejecting  Bonferroni-Holm 
test  guarantees  a  global  test  level.  This  allows  to  avoid 
the  computationally  expensive  bootstrap  method  and 
leads  to  a  more  simple  algorithm.  Approximate  condi¬ 
tional  maximum  likelihood  estimates  in  the  frequency 
domain  are  used  to  overcome  the  resolution  limits  of 
conventional  methods  for  wideband  signal  processing 
and  to  construct  the  sequential  test.  The  combination 
of  global  optimization  via  genetic  algorithm  and  a  local 
one  using  the  scoring  seems  to  be  a  good  compromise 
to  handle  the  problem  of  cumbersome  maximization  of 
the  log-likelihood  function  over  the  parameters  of  in¬ 
terest.  The  algorithm  for  testing  the  number  of  signal 
phases  is  applied  simultaneously  with  the  estimation  of 
the  model  parameters. 

1.  INTRODUCTION 

Earthquakes  and  regional  evens  give  rise  to  a  number 
of  different  types  of  waves,  e.g.  pressure  waves,  shear 
waves,  or  surface  waves.  The  first  waves  to  be  observed 
on  a  seismogram  are  compressional  P- waves  (’’primary” 
waves).  S- waves  (’’secondary”)  are  transverse  (shear) 
waves  with  the  particle  motion  in  the  plane  perpendic¬ 
ular  to  the  direction  of  propagation.  The  characteris¬ 
tics  of  the  various  possible  types  of  seismic  waves  in  a 
’’perfect”  medium  are  well  known  [1].  The  difference  in 
polarization  of  seismic  waves  can  be  used  for  the  anal¬ 
ysis  with  an  array  containing  3-component  sensors  [7], 
In  this  contribution  we  analyse  the  outputs  of  an  ar¬ 
ray  of  vertically  sensitive  seismometers  in  order  to  de¬ 
tect  and  separate  different  phases  of  a  regional  seismic 
event.  General  heterogeneities  existing  along  the  travel 
path  and  underneath  the  array  reduce  the  signal  coher¬ 
ence  and  produce  travel-time  residues  between  the  sta¬ 


tions  of  the  array.  In  this  scenario  there  are  more  than 
one  phase  impinging  on  the  array  within  a  short  ob¬ 
servation  interval.  Consequently,  approximate  condi¬ 
tional  maximum  likelihood  estimates  (ACMLE)  in  the 
frequency  domain  [3]  can  be  used  to  resolve  different 
phases  of  a  seismic  event.  We  combine  the  global  op¬ 
timization  by  means  of  a  genetic  algorithm  and  a  lo¬ 
cal  one  using  scoring  in  order  to  handle  the  problem 
of  maximization  of  the  log-likelihood  function  over  pa¬ 
rameters  of  interest  and  not  to  be  too  computationally 
expensive.  Simultaneous  usage  of  the  model  parame¬ 
ter  estimation  and  testing  algorithm  with  the  bootstrap 
approximation  is  investigated  in  [6]  and  [4].  General  re¬ 
sults  in  the  seismic  application  of  detection  algorithms 
for  narrowband  signals  can  be  found  in  [8].  The  se¬ 
quentially  rejecting  Bonferroni-Holm  test  has  been  for¬ 
mulated  in  [5].  We  present  an  appropriate  test  statistic 
and  a  method  of  wideband  signal  testing. 

2.  DATA  MODEL  AND  WAVE 
PARAMETER  ESTIMATION 

We  assume  that  m  =  1, . . . ,  M  different  types  of  waves 
arrive  at  the  array.  The  positions  of  the  sensors  of  the 
nth  station  (n  =  1, . . . ,  N)  can  be  described  by  a  vector 
.  The  outputs  of  the  sensors  are  Fourier-transformed 
with  a  rectangular  window  of  length  T: 

=  (1) 

The  reception-propagation  situation  is  described  by  a 
{N  X  M)  matrix  H(a;)  =  [dj, ..  .,dyy^]'. 

The  vectors  ^  . . . ,  are  the  phase 

vectors  where 

u? 

ki  TT  ‘  [cos  (j)i  cos  a,',  cos  sin  a,-,  sin  <f)iY 

M 

is  the  wavenumber  vector  of  a  wave  at  frequency  a; 
with  velocity  and  seen  at  the  origin  of  the  array 
at  azimuth  a,*  and  elevation  (j)i.  Since  only  vertically 
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sensitive  seismometers  of  a  plain  array  are  involved 
in  the  analysis,  we  have  =  0  in  this  application. 
The  wavenumber  vectors  may  be  written  as 
(i  =  1, , . . ,  M)  where  is  the  slowness  vector.  The  pa¬ 
rameters  of  interest  are  M  slowness  vectors  i.e.  ^  = 
spectral  parameters  of  the  sources, 
and  spectral  parameters  of  noise.  Approximately  one 
can  express  the  sensor  output  vector  for  each  discrete 
frequency  by 

X{u^k)  =  H{uJk)S{ujk)  +  U{o^k) ,  (2) 

where  5(a;jk)  and  U.{i^k)  are  the  Fourier  transform  of 
the  signals  and  noise.  We  assume  that  H(a;)  and  spec¬ 
tral  power  of  noise  z/(a?)  change  slowly  with  a;,  and 
frequencies  ==  1, . . . ,  P)  suffice  to  describe  the  be¬ 
havior  of  H(c«;)  and  If  noise  is  stationary,  the 

given  S^oJk)  is  approximately  complex  normal 
with  mean  H(a;*)5(u;fc)  and  covariance  matrix  z/(u?*)I. 
Wideband  ACMLE’s  maximize 

p 

L(ih  S,  ld  =  -^  [p^  log  i/(w‘)+ 

i=l 

where  the  inner  sum  is  over  the  p  discrete  frequencies 
a? A-  around  a?*.  Maximization  of  the  function  T(2,  5,  i/) 
over  the  Slu^k)  and  leads  to  the  explicit  solutions 

5(a;A: )  and  i>(u;^)  [4].  Then,  we  can  find  ACMLE’s  ^  by 
minimizing  of 

=  :^  E  (3) 

»=i 

P'  is  the  projector  onto  the  signal  space  of  M  signals 
P’  =  H(w*)[H*(w')H(w*)]"^H*(a;'). 

C\  =  Cx (w’ )  =  -  E 

^  k 

denotes  a  non-parametric  estimate  of  the  spectral  den¬ 
sity  matrix  (SDM).  We  smooth  here  over  p  discrete 
frequencies  cvk  around  a?\ 

3.  SEQUENTIAL  DETECTION 

The  main  idea  of  the  sequential  test  is  to  detect  the 
strongest  signal,  extract  it  and  continue  such  procedure 
until  the  hypothesis  that  there  exists  no  further  signal 
will  be  accepted.  Approximate  conditional  maximum 
likelihood  ratio  tests  (ACMLRT)  for  the  hypotheses 
that  there  is  no  m  -f  1st  signal  if  m  signals  are  already 
detected,  in  analogy  to  [4],  results  in  a  test  statistic 

tm+l{X)  =  2pPN  [-«m+l(^+l)  +  QmipJ]-  (5) 


T)  =  (??'  is  the  vector  of  wave  parameter 

— m+l  ^ 

for  m  -h  1  signals,  Qm+i  and  Qm  are  defined  as  in  (3) 
for  m  +  1  and  m  signals,  respectively.  The  assumption 
that 


inf  « 

-i-m  +  1 


inf 

^m+1 


reduces  the  numerical  effort  for  sequential  ACMLRT. 
Thus,  the  hypothesis  that  there  is  no  m  4-  1st  signal  is 
rejecteded  if 

=  maxQ„+i(2^,e^_^^)  >  /c„+i 

^m+1 

(6) 

with  =  qmivj  - 

Qm+1  is  a  kind  of  geometric  mean  of  i.i.d.  F- variables, 

1  P  ^ 

ni  tr[(I-PUi(!Z„,,i„^i))Cy 

where  is,  under  the  hypothesis,  approximately 

F-distributed  with  ni  and  n2  degrees  of  freedom  and 
stochastically  independent  for  different  frequencies  cj*. 
The  degrees  of  freedom  are  as  follows, 

ni  =  2p{l  -h  r),  712  =  2p(N  —  (m  -h  1)  —  r). 


r  is  the  number  of  the  model  parameters  (in  this  appli¬ 
cation  r  =  2),  It  should  be  pointed  out  that  the  degrees 
of  freedom  ni  and  n2  were  corrected  compared  to  those 
in  [4]  and  [6].  This  gives  more  accurate  approximation 
of  the  test  statistic  distribution  and  improve  the  data 
analysis.  can  be  interpreted  as  an  increace  of 

spectral  signal-to-noice  ratio  for  a  possible  m  -h  1st  sig¬ 
nal  at  £  , .  The  mean  and  variance  of  the  random 

^m+l 

variable  V  =  log(l  +  )  can  be  cal- 

culated  under  hypothesis  approximately  as 


pv  = 


Ey  =  $  ( 

Vary  =  qi' 


\  2  /  v  2  2  /  n2{ni  -h  712) 


where  the  function  is  defined  as  a  log- derivative  of 
the  F-function:  ^  lnr(2:).  In  this  case  we  can 

approximate  the  distribution  of  Qm+i  using  the  central 
limit  theorem.  The  detector  for  the  m  -h  1st  signal  then 
is 
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where  is  the  test  threshold  by  the  given  probabil¬ 
ity  a  of  the  false  alarm.  We  now  approximate  the  test 
statistic  T  by  a  standard  normally  distributed  variable. 
This  formulation  allows  to  avoid  the  cumbersome  boot¬ 
strap  technique  used  in  [4],  [6]  and  leads  to  a  simple  al¬ 
gorithm.  To  guarantee  a  global  level  a  we  alternatively 
use  the  sequentially  rejecting  Bonferroni-Holm  test  [5] 
with  maximal  possible  number  of  signals  Mo  =  5.  Hav¬ 
ing  detected  the  m  -h  1st  signal,  we  have  to  determine 
%n+i  further  minimizing  of  over 

e.g.,  using  initial  value 

4.  EXPERIMENTS  WITH  MEASURED 
SEISMIC  DATA 

We  apply  the  proposed  algorithm  to  measured  seismic 
data  recorded  by  24  seismometers  of  the  GERESS  array 
situated  in  the  Bavarian  Forest.  Detection  and  localiza¬ 
tion  of  regional  and  local  events  is  one  of  the  main  tasks 
of  the  GERESS  array.  The  localization  depends  mainly 
on  the  velocity  analysis  of  dominant  onsets  in  the  seis¬ 
mograms  and  the  ability  to  discriminate  between  P- 
and  S-phases.  If  we  know  the  structure  of  the  crust,  a 
detailed  classification  is  possible. 

The  regional  seismic  event  caused  by  a  blasting  in  an 
iron  mine  on  the  distance  of  171  km  from  the  array  is 
analysed.  We  use  a  sliding  window  with  length  T  = 
3.2  s  corresponding  to  128  sampling  values  by  a  sam¬ 
pling  frequency  of  40  Hz  and  a  shift  of  20  values  (0.5  s). 
This  short  window  length  does  not  allow  to  smooth 
over  frequencies  as  in  (4).  Instead  we  stabilized  the 
estimate  by  use  of  Thomson’s  orthogonal  windows 
[9]  with  T  ==  3.  The  frequency  band  used  in  the  anal¬ 
yse  contains  P  =  33  frequency  bins  in  the  range  from 

Hz  to  10,5  Hz.  The  maximization  of  the  likelihood 
function  over  the  components  of  the  parameter  vector 
is  a  computationally  difficult  task.  Nevertheless,  the 
problem  becomes  manageable  when  a  global  optimiza¬ 
tion  technique  like  genetic  algorithm  [2]  is  applied  to 

(6)  followed  by  a  local  optimization  technique  like  scor¬ 
ing  around  parameters  The  population  size 

for  genetic  algorithm  is  20,  the  probability  of  crossing 
two  strings  is  0.80,  and  the  probability  of  mutation  is 
0.05.  We  represent  the  elements  of  the  vector  ^  by  a 
bit  string  of  length  12.  Signals  were  detected  and,  af¬ 
terwards,  estimated.  Observing  Figure  1  we  note  that 
detected  signals  are  mostly  in  the  arrival  domains  of 
the  P-phase  (’’fast”  longitudinal  waves)  and  S-phase 
(’’slow”  transverse  waves).  Analysis  of  a  regional  seis¬ 
mic  event  was  carried  out  using  sequential  testing  with 
a  global  level  a  =  0, 01  and  ACMLEs  for  azimuth  and 
phase- velocities.  The  estimates  of  azimuth  and  veloci¬ 
ties  are  in  agreement  with  the  physics  of  seismic  wave 


propagation  and  azimuth  of  arrival  detected  by  the  In¬ 
stitute  of  Geophysics  at  the  Ruhr  University. 
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Figure  li  Analysis  of  a  regional  seismic  event  using  sequential  testing  with  a  global  level 
(a  =  0,01)  control  and  ACMLEs  for  azimuth  and  phase- velocities.  The  received  signal  of  a 
reference  seismometer  is  shown  on  the  top. 
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Abstract 

This  paper  presents  a  new  approach  to  the  problem 
of  modeling  nonsiaiionary  CW  ultrasonic  Doppler  sig¬ 
nals.  A  time- dependent  ARMA  model  whose  parame¬ 
ters  vary  periodically  in  time  is  proposed.  The  time 
variation  of  the  model  parameters  is  approximated  by  a 
weighted  sum  of  a  small  number  of  Fourier  base  func¬ 
tions.  It  is  seen  that  the  spectral  characteristics  of  the 
model  output  closely  approximate  to  the  spectral  char¬ 
acteristics  of  the  nonstationary  CW  ultrasonic  Doppler 
signals. 


1.  Introduction 

Doppler  ultrasound  is  an  important  and  powerful 
technique  for  noninvasive  measurement  of  the  velocities 
of  moving  particles  within  the  body.  The  technique  is 
particularly  used  for  the  measurement  of  the  blood  ve¬ 
locity  and  a  number  of  other  parameters  related  with 
blood  flow.  When  employing  the  technique  for  blood 
velocity  measurement,  an  ultrasonic  signal  is  transmit¬ 
ted  by  an  ultrasonic  transducer  through  the  blood  ves¬ 
sel  under  examination.  This  signal  is  reflected  by  the 
red  blood  cells,  causing  an  echo  which  is  demodulated 
to  yield  an  audible  signal  called  Doppler  signal.  Fre¬ 
quency  content  of  the  Doppler  signal  is  closely  related 
with  a  number  of  flow  parameters  that  provide  valuable 
clinical  information  regarding  the  diagnosis  of  various 
vascular  diseases  and  flow  disorders.  Time- variation 
of  these  parameters  can  be  extracted  from  the  Doppler 
signal  by  processing  it  with  appropriate  spectral  analy¬ 
sis  techniques. 

In  order  to  obtain  clinical  diagnostic  information 
from  the  Doppler  signal  with  minimum  error,  it  is  es¬ 
sential  to  use  accurate  and  reliable  techniques  for  the 
analysis  of  the  signal.  Since  the  Doppler  signal  is  highly 


nonstationary  and  its  true  time- varying  spectrum  is  un¬ 
known,  it  is  impossible  to  test  the  accuracy  and  reli¬ 
ability  of  a  technique  with  real  Doppler  signals.  For 
this  reason,  new  Doppler  signal  analysis  techniques  are 
first  tested  with  simulated  Doppler  signals  and  then 
employed  for  the  analysis  of  real  signals  if  they  prove  to 
be  reliable.  Therefore  it  is  very  important  to  generate 
artificial  signals  whose  power  spectral  density  functions 
are  known  and  spectral  characteristics  are  as  close  to 
those  of  real  Doppler  signals  as  possible. 

Modeling  ultrasonic  Doppler  signals  has  been  of  high 
interest  in  the  last  decade  since  the  models  developed 
have  not  only  led  to  a  better  understanding  of  the 
mechanism  governing  the  generation  of  the  Doppler  sig¬ 
nal  but  also  provided  valuable  tools  for  the  assessment 
of  various  Doppler  signal  analysis  methods. 

The  classical  method  for  the  simulation  of  Dopp¬ 
ler  signals  is  based  on  the  principle  that  the  output 
of  a  linear  filter  excited  by  a  white  Gaussian  noise  is  a 
Gaussian  random  process  whose  power  spectral  density 
is  equal  to  the  magnitude  squared  of  the  filter  response. 
This  approach  is  used  by  Kristoffersen  and  Angelsen  [1] 
in  a  time  shared  B-mode  imaging  and  Doppler  mea¬ 
surement  system  to  generate  Doppler  signal  segments 
for  filling  in  the  gaps  due  to  B-mode  interruptions  in 
the  output  of  the  Doppler  unit. 

Another  approach  to  Doppler  signal  simulation  is 
to  first  generate  some  stationary  Doppler  signals  and 
then  modulate  their  spectral  characteristics  with  ap¬ 
propriate  time-varying  filters.This  approach  was  used 
by  Leeuwenef  al.  [2]  to  test  some  Doppler  blood  veloc¬ 
ity  measurement  systems. 

In  all  of  the  methods  above,  a  theoretical  time- 
varying  Doppler  power  spectral  density  function  is  as¬ 
sumed.  Next,  a  filter  whose  magnitude  squared  time- 
varying  frequency  response  is  similar  to  the  theoretical 
Doppler  spectrum  is  designed.  This  filter  is  excited 
with  a  white  noise  signal  to  yield  honstationary  Dopp¬ 
ler  signal  at  its  output.  Although  this  approach  is  very 
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practical,  the  design  of  the  filter  can  be  quite  com¬ 
plicated  if  the  spectral  density  of  the  signal  changes 
rapidly  in  time  as  in  the  case  of  CW  Doppler  signals. 

In  order  to  overcome  the  diffulties  arising  from  the 
design  of  filters  having  time  varying  frequency  re¬ 
sponses,  Mo  and  Cobbold  [3]  proposed  a  nonstationary 
Doppler  signal  simulation  model  based  on  a  weighted 
sum  of  sinusoidal  components.  The  magnitudes  of  the 
components  are  obtained  by  evaluating  a  theoretical 
Doppler  power  spectrum.  Though  this  approach  does 
not  require  complex  filter  designs,  it  still  relies  on  a  the¬ 
oretical  Doppler  power  spectrum  and  is  computation¬ 
ally  more  expensive  than  the  other  methods  mentioned 
above. 

In  this  paper,  we  present  a  new  method  for  the  sim¬ 
ulation  of  the  CW  ultrasonic  Doppler  signals.  The  ap¬ 
proach  presented  here  does  not  depend  on  a  theoretical 
power  spectral  density.  When  the  time-frequency  spec¬ 
trogram  for  a  CW  Doppler  signal  is  calculated,  it  is 
seen  that  the  variation  of  power  spectral  density  along 
the  time-axis  is  similar  to  periodic  with  a  period  ap¬ 
proximately  equal  to  the  average  cardiac  cycle  dura¬ 
tion.  Then,  if  this  signal  is  modeled  using  a  paramet¬ 
ric  model,  one  expects  the  model  parameters  to  show  a 
similar  periodic  time  variation  provided  that  the  model 
order,  that  is  the  number  of  the  model  parameters,  is 
fixed.  Based  on  these  observations,  a  time-dependent 
autoregressive  moving  average  (ARMA)  model  is  em¬ 
ployed  in  this  paper  to  model  the  data.  The  variation  of 
the  model  parameters  is  assumed  to  be  periodic  with  a 
period  equal  to  the  average  cardiac  cycle.  Hence,  these 
parameters  are  approximated  with  a  weighted  combi¬ 
nation  of  a  small  number  of  base  functions.  The  Fourier 
base  is  chosen  here  among  a  number  of  bases  available 
in  the  literature  [4].  Comparisons  in  both  time  and 
frequency  domain  show  that  the  method  proposed  here 
can  successfully  be  used  to  model  the  CW  ultrasonic 
Doppler  signals. 

2.  Method 

Let  x(0), . . . ,  x{N  -  1)  denote  the  N  signal  samples 
to  be  modeled  which  are  obtained  by  equally  sampling 
a  CW  Doppler  signal  along  the  cardiac  cycle.  For  the 
purpose  of  modeling  this  signal,  we  propose  a  time- 
dependent  ARMA  model  [5] 

p  « 

x(n)-|-^aj(n-i)x(n  —  i)  =  e(n)+^  6»(n  — i)  e(n-i) 
i=l  *=i 

(1) 

where  a,(n  —  t)  and  6j(n  —  i)  are  the  time  dependent 
model  parameters,  p  and  q  are  model  orders  and  e(n) 


is  the  driving  process  which  is  a  zero  mean  and  unity 
variance  white  noise  process. 

Based  on  a  priori  knowledge,  we  assume  that  the 
time  variation  of  the  model  parameters  is  periodic  with 
a  period  equal  to  the  number  of  the  signal  samples 
N,  Then  these  parameters  can  be  approximated  by  a 
weighted  combination  of  a  small  number  of  base  func¬ 
tions.  By  employing  this  approach  and  choosing  the 
Fourier  base,  an  approximate  representation  for  the 
time  variation  of  the  pzirameters  can  be  obtained  as 

m 

a<(n)=  ^ 

4j=— m 

m 

6<(n)=  di.t 

Jb=-m 

where  2m  +  1  is  the  number  of  base  functions,  Ci^k 
di^k  are  the  weights  and  wq  =  2ir/N,  Then  the  quanti¬ 
ties  a,(n  —  i)  x(n  —  t)  and  bi{n  —  i)  e(n  —  i)  become 

m 

a.(n-»Xn-t)=  c.-, »  x(n  - 

Jb=-m 

m 

biin-i)ein-i)=  ^  d.-, *  c(n  -  i) 

which  can  also  be  expressed  in  vector  form  as  follows 
a,(n  —  i)  x{n  —  i)  =  u^(n  —  i)  Cj  (2) 

bi{n  “  i)  e(n  -  i)  =  v^(n  -  i)  (3) 

where 

u(n  -  0  =  x(n  -  0  .  ^jm{n-i)wo^T 

v(n  -  i)  =  e(n  -  i)  . . .  e^rn(n-i)w„^T 

Ct  —  [c.j— m  (m— 1)  •  •  •  ^t,m] 

di  —  1)  '  •  • 

Here  the  superscript  T  denotes  matrix  or  vector  trans¬ 
position. 

By  substituting  (2)  and  (3)  into  (1)  we  obtain 

i(n)  +  u’’(n  -  1)  Cl  H - h  u^(ti  -  p)  Cp  = 

e(n)  +  v^(n  —  1)  di  + - 1-  v^(n  —  q)  d, 

which  may  shortly  be  rewritten  as 

x(n)  +  <t>^iri)  6  =  e(n)  (4) 

where 

^(n)  =  [u^(n  -  1)  u'^(n  -  2)  ...  u’’(n  -  p) 

-v^(n  -  1)  -  v^(n  -  2)  ...  -  v^(n  -  q)]^ 

=  [cr4...c^f 
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(4)  is  identical  to  (1)  but  it  offers  an  important  ad~ 
vantage:  model  •parameters  of  (4)  are  just  constants 
and  not  dependent  on  time.  Therefore,  we  converted 
a  time-varying  modeling  problem  into  a  time-invariant 
modeling  problem  by  making  the  definitions  so  far. 

We  now  employ  a  least  squares  method  to  es¬ 
timate  the  parameter  vector  0  of  the  new  time- 
invariant  model  of  (4)  from  the  original  signal  samples 
x(0), . . . ,  x{N  —  1).  If  we  define 

x(n)  =: —<j)^ (n)  0  (5) 

(4)  takes  the  following  form 

x{n)  —  x{n)  =  e(n)  ^  x{n)  =  x{n)  +  e(n) 

Here  x[n)  may  be  thought  of  an  approximation  to 
the  original  signal  sample  x{n)  at  time  ri.  In  this  case 
e(n)  represents  the  approximation  error.  If  we  apply 
this  approach  for  all  the  signal  samples  available,  we 
obtain  the  following  set  of  equations: 

a:(0)  =  ^(O)-be(O) 
x{l)  =  x{l)^e{l) 

x{N-l)  =  x{N^l)-\-eiN--l) 

By  substituting  the  definition  (5)  in  the  above  set 
of  equations,  we  get 

x(0)  =  -f- e(0) 

a:(l)  =  -^^(l)0  +  e(l) 

x(N-l)  =  ~l)e  +  e(N  -1) 

which  can  also  be  written  in  matrix  form  as 

X  =  $  ^  +  e  (6) 

where 

X  =  [x(0)  a:(l)  . . .  x{N  —  1)]^ 

$  =  [^^(0)  ^^(1)  . . .  -  1)] 
e  =  [e(0)  e(l)  . . .  e{N  -  1)]^ 

If  we  choose  a  cost  function  V{0)  as  the  sum  of  squared 
approximation  errors 

n=0 

a  least  squares  solution  to  the  parameter  vector  0  in 
(6),  which  minimizes  V{0),  can  be  found  as 

0  (7) 


3.  Results  and  discussion 

We  modeled  a  CW  Doppler  signal  recording  ob¬ 
tained  from  the  carotid  artery  of  a  healty  subject.  The 
average  cardiac  cycle  duration  is  found  to  be  approxi¬ 
mately  0.86s.  The  signal  at  the  output  of  the  Doppler 
unit  is  sampled  along  one  cardiac  cycle  at  a  sampling 
rate  of  lOkHz. 

The  samples  are  then  modeled  using  the  time- 
varying  ARMA  modeling  approach  proposed  in  the 
previous  section.  Several  values  for  the  parameters  m 
and  p  are  tried.  It  is  seen  that  the  performance  of  the 
simulation  is  not  strictly  dependent  on  m.  The  results 
presented  here  are  for  m  =  8  and  p  =  g  =  5. 

The  signal  at  the  output  of  the  ARMA  model  is 
compared  with  the  original  CW  Doppler  signal  both 
in  frequency  and  time  domains.  For  time-domain  com¬ 
parison,  two  sets  of  signal  samples  from  original  CW 
Doppler  recording,  one  obtained  during  peak  systole 
and  the  other  at  end-diastole,  are  compared  with  the 
corresponding  sets  of  signal  samples  obtained  from  the 
simulated  model.  Each  set  contained  100  samples.  For 
frequency  domain  comparison,  time-frequency  spectro¬ 
grams  for  both  signals  are  calculated  by  using  the  fast 
Fourier  transform  method  via  periodogram  approach. 
The  frame  length  is  chosen  to  be  256  with  a  50%  over¬ 
lap.  Frames  are  windowed  by  using  a  length-256  Han¬ 
ning  window  before  taking  the  FFT. 

Figure  1  shows  the  comparison  of  the  original  sig¬ 
nal  and  one  realization  of  the  simulated  signal  in  time 
domain.  In  this  figure,  solid  line  represents  the  model 
output  while  the  dotted  line  represents  the  original  sig¬ 
nal.  Here  it  is  seen  that  the  model  output  very  closely 
approximates  to  the  original  signal  both  at  peak  systole 
(a)  and  end-diastole  (b). 

Figure  2  shows  the  comparison  of  signal  spectro¬ 
grams  for  one  cardiac  cycle.  In  this  figure,  the  horizon¬ 
tal  axis  shows  time  (t),  the  vertical  axis  frequency  (/) 
and  gray  level  at  the  coordinates  (^,  /)  the  power  of  the 
signal  component  with  frequency  /  at  time  instant  t. 
As  it  can  easily  be  seen,  the  gray-scale  speckle  patterns 
are  very  similar.  This  shows  that  the  time  variation  of 
the  spectral  characteristics  of  the  CW  Doppler  signal 
is  well  represented  by  the  simulation  model.  Further¬ 
more,  when  the  signal  obtained  at  the  output  of  the 
ARMA  model  is  converted  to  analog  form  and  played 
back  for  an  audio  comparison,  it  is  found  to  be  almost 
indistinguishable  from  the  original  recording. 

The  main  advantage  of  the  simulation  method  pro¬ 
posed  in  this  paper  over  other  methods  in  the  literature 
is  that  it  does  not  require  the  assumption  of  a  theoret¬ 
ical  power  spectral  density  function.  The  parameters 
of  the  time  varying  ARMA  signal  model  are  estimated 
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(a) 


Figure  1.  Comparison  of  original  (dotted  line) 
and  simulated  (solid  line)  signals  in  time  do¬ 
main;  (a)  peak  systole,  (b)  end-diastole. 


by  directly  using  the  signal  samples  available. 
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Abstract 

The  eonitnuous  growth  of  electro  magnetic  pollution 
produces  many  problems  during  radio  astro  no  my  dser- 
vations.  A  method  of  rejection  of  industrial  interfer'- 
enee  was  proposed  and  tested.The  main  idea  eonsiis  of 
using  change-  point  detection  with  implementation  tn 
digital  signal  proctssor.The  modified  cumulative  sum 
( CUSUM)  method  which  analyzes  several  moments  of 
empirical  proiahility  distribution  of  input  noise  and 
uses  adaptive  thresholds  was  elaborated,  simulated  on 
computer  and  then  tested  in  the  real  observations  on 
radiotelescope.  The  obtained  sensitivity  was  much  bet¬ 
ter  than  without  such  a  processing.  The  proposed  real¬ 
time  signal  processing  procedure  may  be  used  at  many 
of  those  radiotelescopes  which  suffer  from  the  industrial 
interference,  especially  at  long  wavelengths. 


1.  Introduction 


at  RATAN-GOO  radiotelescope  durixig  13cm  and  31cm 
wavelengtb  radioobservations. 

2.  Proposed  Method  and  Main  Results 

The  block  diagram  of  radiometer  with  postdetec- 
tioa  DSP  stages  is  showa  in  Fig.l.  The  sampling  fre¬ 
quency  /,  of  10-bit  ajialogue-dlgital  converter(ADC) 
was  limited  by  the  serial  link  between  ADC  and  DSP  - 
5  Mbit/s.  In  our  case  /,  was  equal  to  200  Ksamples/s. 


SRtenM 


Electromagnetic  pollution  limits  the  real  sensitiv¬ 
ity  of  modern  radiotelescopes, especially  at  long  wave¬ 
lengths.  Industrial  interference,  radars,  cars, radio- 
stations  etc.  produce  a  lot  of  noise  which  is  aver¬ 
aged  with  the  natural  noiselike  signals  from  extrater¬ 
restrial  radiosources.  There  were  several  attempts  to 
build  new  radiotelescopes  at  radioecologically  isolated 
places.  Also  radiointerferometry  is  less  sensitive  to  in¬ 
dustrial  interference  due  to  the  absence  of  correlation 
of  such  a  noise  at  long  distances.  But  the  problem  of 
single-dish  widesprectrum  radioobservations  in  bad  ra* 
dioecological  conditions  forces  to  look  for  the  special 
methods  of  signal  processing  which  could  improve  the 
output  precision  of  observational  data.  This  report  de¬ 
scribes  work  on  an  real-time  digital  signal  procssing 
(DSP)  system  using  special  DSP  processors  from  the 
TMS320  family  -  TMS320C53.  This  system  was  tested 


Figure  1.  Block  diagram  of  the  reoeiver  with 
digital  back  end. 


First  of  all  the  form  and  structure  of  industrial 
noise  were  studied.  The  most  typical  were  stochas¬ 
tic  bursts  with  complex  structures  and  radar  impulses. 
The  common  processing  in  radiometers  indudes  the 
averaging  of  fiuctuations  after  detector  during  several 
seconds,  minutes  and  sometimes  hours.  It  is  dear 
that  in  the  presence  of  such  an  interferences  smooth¬ 
ing  of  bursts  deteriorates  the  output  standard  devia¬ 
tion  (r.rxLS.  error).  It  is  necessary  to  intercept  these 
bursts  without  smoothing  and  eliminate  them.  The  al¬ 
gorithm  making  this  procedure  must  be  quick,  simple 
and  effective.  Several  procedures  were  tested  from  this 
point  of  view  and  finally  the  modified  cumulative  sum 
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inetliod(CUSUM)[l]  with  adaptive  threshold  was  cho- 
sell)  simulated  ou  computer  and  Implemeuted  iu  real  ra^ 
dioobservations .  The  quality  criterion  of  the  algorithm 
was  the  radiometer  fluctuation  sensitivity  -  the  r.m.8 
level  of  output  oiKuUationsAT.  For  the  ideal  total- 
power  radiometer  the  minimum  detectable  change  in 
the  radiometer  antenna  temperature  is  equal[2] 

T  =  jT,y,y/AF/Af  (1) 

where  A/  -  radio- 

frequency(predetection)  bandwidth  of  the  receiveri  AF 

-  postdetection  bandwidth, -  effective  noise  tem¬ 
perature  of  the  radiotelescope.  Ideal  filters  with  rect¬ 
angular  passband  forms  are  supposed.  The  mean  value 
after  the  square-law  detector  is  proportional  to 
and  must  be  measured  with  the  most  possible  preci- 
8ion(minimum  standard  deviation).Fhctor  (7  >  1  de¬ 
pends  on  the  structure  of  radiometer[2l  and  in  this 
work  the  radiometer  with  noise  adding  and  antiphase 
gain  modulation  was  used[3].  In  the  presence  of  inter¬ 
ference  the  output  postdetection  oscillations  may  be 
written  as 

y  =  (2) 

where  noiselike  system  signal  which  mean 
and  variance  6x9^9  proportional  to  T«y«y  and  eut 

-  oscillations  due  to  interference.  In  digital  case  /•  = 
2AF  and  the  time  of  integration  r  =  no//#,  no  -  num¬ 
ber  of  averaged  samples  which  characterize  the  integra¬ 
tion  interval, no  ~  N  *  M ,  N  -  length  in  samples  of  the 
modulation  halfperiod,  M  -  number  of  modulation  pe¬ 
riods  in  total  averaging  interval.  The  output  standard 
deviation  (1)  in  these  terms  is  proportional  to 

6x1  *  *>;7\/l/(A/r)  (3) 

The  post  detection  interference  burst  eut  has  mean 
eiAt  a^d  duration  n%int  (in  samples  )•  The  output  mean 
y2  (system  +  interference)  is  equal  to 

y2  ^  S?#|f#(l  Of)  -f-  (X9y9  etnt)  *  Of  (4) 

where  nutino  ^  ot  <  l,ni«<  =s  > 

0}4ndicator  function  /[t]  Is  defined  as  /[tl  s  1,  €ini[([  > 
0  and  I[{[  s=  0,  =  0.  Fbr  the  detection  of  change 

points  of  mean  at  the  f-th  interval,  I  =  the 

following  CUSUM  procedure  was  used: 

So  0, 

issl 

m»  ss  miiiSik,  0  <  k  <n  (5) 


Rgur«  2.  Ttie  ratio  of  the  r.m.e.error 
without  Interference  elimination  to  the 
error  after  thin  elimination, Q—  10000. 

a»0.2 - ,a=0.4 - , 

asO.O - ,a=0.8 - 

the  moment  of  mean  change  »<,*  =  argS»p,  ij — m,  > 
A,  A  -  threshold  defined  as  A  =  jS  •  estima¬ 

tion  of  standard  deviation  at  1—1  -th  interval,  -  factor 
chosen  by  operator,  -  the  minimum  expected  value  of 
mean  chamge,  t  =  1  •  •  •  ^  -  number  of  samples  at  each 
/-th  interval. 

After  eliinination  of  "contaminated’*  intervals  with  to¬ 
tal  duration  ftiAt  the  standard  deviation  is 

6yz  =  3^s/l/2Af(l-a)T,  (6) 

which  is  worse  comparing  with  (3).  The  ratio  of 
standard  deviation  without  elimination  of  interference 
to  6^3  may  be  considered  as  the  gain  obtained  in  con¬ 
sequence  of  this  procedure  and  is  equal  to 

Go  ^(1  —  Of)(l  -f-  (7) 

where  Q  =  2A/r  -  radiometer  factor.  The  curves 
for  Go  versus  we  shown  in  Fig.  2,  each  curve 

corresponds  to  definite  or ,  from  0.2  till  0,8., Q  =s  500,// 
The  nonideal  elimination  due  to  the  finite  threshold 
A  gives  reduced  Git 

'  y/a^20l(QoN)  +  -  a))  ’ 

where  Qo  =  Af/AF . 

The  results  of  interference  elimination  during  RATAN- 
600  radioobservationa  at  31  cm  wavelength  in  real  time 
are  given  in  Fig.3.  One  can  see  considerable  elimination 
of  interference  without  losing  the  total  r.m.8.  sensitiv¬ 
ity. 
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3.  Conclusion 

Real  time  digital  signal  processing  in  a  wide 
videobaAd  may  improve  tKe  observational  situation  in 
the  presence  of  industrial  interference.  Rapid  progress 
in  DSP  processing  gives  hope  that  the  whole  postde¬ 
tection  processing  in  radioaatronomy  technique  will  be 
made  digitally. 

This  work  was  supported  by  the  RPFl  grant  95-02- 
03770. 
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Abstract 

A  Pseudo- Linear  method  for  the  estimation  of  Frac¬ 
tionally  Integrated  ARMA  (ARFIMA)  models  is  intro¬ 
duced.  The  method  uses  a  long  binomial  series  expan¬ 
sion  of  the  fractional  differencing  operator^  as  well  as 
the  relationship  of  the  AR/MA  parameters  and  bino¬ 
mial  expansion  terms  with  the  modeVs  inverse  function. 
It  is  based  upon  a  pseudo-linear  formulation  motivated 
by  the  fact  that  this  relationship  leads  to  a  special-form 
regression  problem  that  can  be  decomposed  into  a  scalar 
non-linear  and  a  multiple  linear  regression.  The  perfor¬ 
mance  characteristics  of  the  method  are  demonstrated 
via  Monte  Carlo  experiments  and  comparisons  with  the 
frequency- domain  Maximum  Likelihood  method. 

1.  Introduction 

Most  of  the  work  on  time  series  analysis  has  been 
concerned  with  series  characterized  by  the  property 
that  distant  observations  behave  independently,  or 
nearly  so.  Yet,  in  many  empirical  studies  [1-3]  the  de¬ 
pendence  between  distant  observations  is  not  negligi¬ 
ble  and  decays  very  slowly.  Series  with  such  long-term 
persistence  are  referred  to  as  long-memory  time  series, 
and  their  power  spectral  density  increases  indefinitely 
as  the  frequency  approaches  zero,  while  their  autocor¬ 
relation  decays  hyperbolically. 

Long-memory  time  series  aren’t  well  represented  by 
the  usual  stationary  AutoRegressive  Moving  Average 
(ARMA)  models,  which  are  characterized  by  limited,  at 
the  origin,  power  spectral  density  and  an  exponentially 
decaying  autocorrelation  [4]. 

A  class  of  models  that  exhibits  the  foregoing  long- 
memory  characteristics  is  that  of  Fractionally  Inte¬ 
grated  ARMA  (ARFIMA)  models.  This  in  essence  is 
an  extension  of  the  Integrated  ARMA  (ARIMA)  mod¬ 
els  of  Box  et  al.  [4],  in  which  the  differencing  operator 


is  raised  into  a  fractional,  instead  of  the  usual  integer, 
power. 

The  majority  of  the  available  ARFIMA  model  esti¬ 
mation  methods  follow  a  two-step  approach,  according 
to  which  an  estimate  of  the  fractional  power  is  obtained 
(usually  in  the  frequency  domain)  in  the  first  step,  and 
a  standard  ARMA  estimation  technique  is  applied  to 
the  adjusted  (filtered  by  the  fractional  differencing  op¬ 
erator)  time  series  in  the  second.  These  methods  have 
been  criticized  for  failing  to  produce  good  estimates  for 
relatively  short  data  records  [2].  The  alternative  one- 
step  methods  advocate  the  simultaneous  estimation  of 
all  model  parameters  based  upon  variants  of  the  Max¬ 
imum  Likelihood  procedure  in  either  the  time  or  the 
frequency  domains  [2,5-7].  A  major  drawback  of  this 
category  of  methods  is  their  high  computational  com¬ 
plexity. 

In  this  paper  a  simple  and  computationally  effi¬ 
cient  Pseudo-Linear  method  for  ARFIMA  model  es¬ 
timation  is  introduced.  The  method  uses  exclusively 
time- domain  operations  and  is  based  upon  the  decom¬ 
position  of  a  special-form  regression  problem  into  a 
scalar  non-linear  and  a  multiple  linear  regression. 


2.  Problem  statement 

An  ARFIMA{n,djm)  process  is  of  the  form: 

$(5)  •  (1  -  BY  ■  Xt  =  0(S)  •  at 

at  ~  i.i.d.N{Q,  al)  d  €  (-0.5, 0.5)  (1) 

with  t  indicating  discrete  time,  Xt  the  observed  time  se¬ 
ries,  at  an  independently  identically  distributed  (i.i.d.) 
Gaussian  sequence  with  the  indicated  mean  and  vari¬ 
ance,  B  the  backshift  operator  {BXt  =  A’t-i),  d  the 
fractional  power,  and  ^(5),  0(5)  the  autoregressive 
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(AR)  and  moving  average  (MA)  polynomials; 

n  m 

#(S)  =  I +  Y1  0(^)  =  1  +  E  (2) 

2=1  1=1 

Fractional  differencing  is  defined  by  the  binomial  se¬ 
ries  expansion: 

CO 

{I- bY  =l  +  Y^aj-B^  =  l  +  arR+a2-B='  +  ...  (3) 

3  =  1 


with  ao  =  1,  ajb  =  0  {k  >  p),  =  1,  </>*  =  0  {k  > 

n).  Denoting  this  representation’s  inverse  function  op¬ 
erator  as: 

oo 

/(B)  =  1  +  E  =  P{B)IQ{B)  (8) 

2  =  1 

and  combining  it  with  (7)  yields: 

OLl  -h  —  ©2  :=  Ii 

OC2  +  -f  $2  -fl©!  “  ©2  ^  ^2 


ai  =  -d,  (j  =  2,3,...) 

(4) 

The  process  representation  (1)  is  assumed  to  satisfy 
the  following  standard  assumptions: 


Al.  d  e  B  ^  (-0.5, 0.5)  ,  $(B)  ^  0  for  |B|  <  1 
(siationarity  conditions) 

A2.  0(^)  0  for  |5|  <  1  (inveriibility  condition) 

The  problem  of  ARFIMA  process  estimation  may 
be  then  stated  as  follows:  ^^Given  time  series  data  Xt 
t  E  select  a  particular  model  Af(p)  from  the 

model  set^ : 


M  =  {A<(p)  :  $(B)  •  (1  -  bY  ■  X,  =  0(B)  •  et(p) 
p  =  [d  alf  e  V  X  B($)  X  B(0)  X  3?+}  (5) 

where  et(p)  represents  the  model’s  one-step-ahead  pre¬ 
diction  error,  al  its  variance,  </>,  9  the  AR  and  MA 
parameter  vectors,  respectively,  and  7^(©)  the 

regions  of  in  which  the  stationarity  and  invert- 

ibility,  respectively,  conditions  hold.” 

3.  The  pseudo-linear  estimation  method 

The  substitution  of  a  truncated,  p-th  order,  binomial 
series  expansion  of  the  fractional  differencing  operator 
(3)  into  the  ARFIMA  representation  (1)  yields  the 
ARMA(p  -f  n,  m)  representation: 

(l  +  Pi  •B  +  ...  +  /,p+„  .BP+")  .Xt  = 

=  (l  +  ei-B  +  ...e,r.-B"‘)-at  (6) 

with  Pi  defined  by  the  convolution  expressions: 

i 

Pi  =  'I2oik<f)i-k  {i=l,2,...,p+n)  (7) 

k=0 

^Bold  face  lower- case/capital  characters  represent  vec¬ 
tor/matrix  quantities. 


OCp  +  ap-i  •$!  +  ...  +  ttp-n  •  -  4-1  *  ©1  ~  ‘  • 

^p—m  *  ©m  ^  4 

+  .  .  .  +  ap^ri-\~l  •  ^  4  ■  ©1  • 

4  —  ^+1  *  ©n^  ^  4+1 

ap  *  ^2  +  *  .  •  +  ap-n-^2  •  4+1  ’  ©1  ^  • 

—  4— m+2  *  ©m  ^  4+2 

CXp  *  ”  4+»^  —  l  '  ©1  ”“  •  •  •  ^p+n  —  m  *  ©m  ~  Ip+n 

(9) 


3.1  Stage  one  estimation 

In  this  stage  initial  parameter  estimates  are  obtained 
based  upon  the  inverse  function  operator  (8). 

Inverse  function  estimation 
Consider  the  model: 


/(5,i).X,  =  er(i)  (10) 


that  corresponds  to  the  process  representation  implied 
by  (8).  In  this  model  I{Byi)  represents  a  finite  (trun¬ 
cated)  s-th  (s  >  p+n)  order  approximation  (permitted 
by  way  of  assumption  A2)  of  the  inverse  function  oper¬ 
ator,  i  the  corresponding  parameter  vector,  and  e“’*(i) 
the  model’s  one-step- ahead  prediction  error  at  time  t. 
An  interval  estimate  of  the  inverse  function  param¬ 
eter  vector  is  obtained  through  the  expressions: 

i=  f  E  v-rwr)  fEv-rx.)  (n) 

\t=-S  +  l  /  \t  =  5  +  l  / 


Cov^  = 


N 


1  ^ 


1-1 


i=5+l 


/;Ear \2 
(<^e  ) 


1 

N-s 


t~^  +  l 


(12) 

(13) 


with  V’r  =  [X(_1  Xt_2  •  •  •  Xt-,f. 
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Initial  parameter  estimation 

Initial  parameter  estimation  is  based  upon  expressions 
(9)  that  relate  the  fractional  power  and  AR/MA  pa¬ 
rameters  with  the  modePs  inverse  function.  Given  in¬ 
verse  function  estimates,  these  lead  to: 


Ji  —  ai 
ja 


-f-e  ^ 


where: 

0 

^  1  i  —  j 

A[(p+„)xn](*.  j)  =  ^  Q 

ai^j  otherwise 


<t>l  .  .  • 


.  A  r  ^  f  v  ^  r  f 

Jl  —  il  •  •  •  \  »  —  [  -t; 


P+1 


(14) 


(15) 


(16) 

(17) 

-4+«  ] 

(18) 


ai  =  [  ai  ...  dp  (19) 

with  e  representing  an  error  vector.  Expressions  (14) 
define  a  special-form  regression  problem  that  is  non¬ 
linear  in  the  fractional  power  d  but  linear  in  the 
AR/MA  parameter  vector  p.  The  optimization  of  the 
regression  cost  function: 


(20) 


with  q  =  p  +  n  m. 


3.2  Stage  two  estimation 


This  stage  aims  at  refining  the  estimates  of  stage 
one.  Let: 


-  A 

Pi-1  = 


.  T  -  T 

di-1 


T 


denote  the  vector  of  ARFIMA  parameter  estimates 
obtained  at  iteration  i—1,  and  initially  equal  to  those 
provided  by  stage  one.  At  iteration  i  these  estimates 
are  updated  as  follows: 


Fractional  power  and  AR  parameter  estimation 
Assuming  small  perturbations  in  the  MA  param¬ 
eter  estimates  during  successive  iterations,  that  is 
0(^,Pi-i)  ^  0(5>Pi)  ARFIMA  model  (5)  may, 
at  iteration  f,  be  approximately  expressed  as: 

e,(^i)«$(B,p,)-(l-^)"‘-^r'  (23) 

with: 

Xf-^  ^X,/e{B,Pi_,)  (24) 

The  model  (23)  is  of  the  Fractionally  Integrated  Au- 
toRegressive  [FIAR{n,d)]  fovnij  and  its  parameters 
may  be  estimated  via  a  procedure  similar  to  that  of 
Stage  1.  In  this  case  equation  (14)  is  such  that  X  =  A, 
p  =  and  the  weighting  matrix  Qkxk  in  (20)  is  se¬ 
lected  equal  to  the  corresponding  submatrix  of  the  es¬ 
timated  inverse  function  covariance  (12).  Due  to  the 
form  of  (14)  in  this  case,  this  leads  to  optimal,  in  the 
sense  of  the  Gauss-Markov  theorem  [8],  estimates. 


may  be  then  accomplished  through  a  pseudo-linear 
two-step  procedure,  according  to  which  the  fractional 
power  is  varied  through  an  appropriate  search  scheme 
and  conditional,  upon  it,  AR/MA  parameter  estimates 
are  obtained  as: 

p(d)  =  (x^x(n+m)QfcXfcXfc  x(n+m)^ 

^hx(n+m)Qkxkykxl  (21) 

In  the  above  k{n-\-m<k<n  +  p)  refers  to  the 
number  of  scalar  equations  of  (14)  actually  used  in  the 
regression,  while  QjbxJb  represents  a  proper  weighting 
matrix.  The  procedure  is  terminated  once  the  mini¬ 
mum  of  J(d,p)  is  achieved.  The  innovations  variance 
is  then  estimated  as: 

S  (22) 

^  t=^+l 


MA  parameter  and  innovations  variance  estimation 
The  MA  parameters  are  then  updated  by  solving  the 
linear  regression  problem  [obtained  from  (14)]: 

J  .  61  ==  (25) 

with  denoting  the  regression  error  vector,  and: 
i 

=  ii-Y,akk-k  (i=l,2,---p  +  n)  (26) 
A;=0 

where  Ofo  ^  1,  =  0  (A:  >  p),  <i)Q  =  \,  <l>k  =  ^  {k  >  n). 

The  innovations  variance  is  updated  through  (22). 

4.  Numerical  experiment 

Consider  the  ARFIMA{1^  d,  2)  process  with  param¬ 
eters  indicated  in  Table  1.  This  process  is  character¬ 
ized  by  a  sharp  spectral  valley,  owing  to  the  proximity 
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Parameter  Actual  Estimate  ±  std.  deviation 


PL  ML 


d 

0.30 

0.313 

± 

0.086 

0.301 

± 

0.067 

0.60 

0.582 

± 

0.073 

0.587 

± 

0.062 

01 

-0.40 

-0.396 

db 

0.032 

-0.380 

± 

0.032 

02 

0.99 

0.916 

± 

0.031 

0.865 

± 

0.097 

<^1 

1.00 

0.950 

± 

0.086 

1.124 

± 

0.136 

Table  1.  Monte  Carlo  estimation  results  by  the 
Pseudo-Linear  (PL)  and  Maximum  Likelihood 
(ML)  methods  {N  =  300;  20  runs). 

of  its  complex  conjugate  pair  of  zeros  to  the  unit  circle 
(magnitude  of  0.995). 

Monte  Carlo  estimation  results  by  the  Pseudo- 
Linear  (PL)  (p  =  20,  s  =  30)  and  frequency- domain 
Maximum  Likelihood  (ML)  [9]  methods  are,  based 
upon  300-sample-long  data  records,  summarized  in  Ta¬ 
ble  1.  Despite  the  relatively  short  data  record  length 
and  the  significantly  higher  computational  complex¬ 
ity  of  the  ML  method,  the  performance  characteristics 
of  the  two  methods  appear  similar.  The  ML  method 
provides  a  slight  improvement  in  the  fractional  power 
estimate,  while,  quite  interestingly,  the  PL  method 
achieves  a  noticeable  improvement  in  the  MA  parame¬ 
ter  estimates. 

These  observations  are  additionally  confirmed  from 
the  estimated  spectra,  which  are  contrasted  to  the  the¬ 
oretical  process  spectrum  in  Figure  1.  From  these  it 
is  indeed  evident  that  the  PL  method  achieves  a  sig¬ 
nificantly  reduced  estimation  scatter  in  the  neighbor¬ 
hood  of  the  spectral  valley,  while  also  providing  a  few 
estimates  that  are  practically  indistinguishable  from 
the  theoretical  curve  in  the  proximity  of  the  spectral 
minimum.  Similar  behavior  is  observed  with  longer 
(iV  =:  1,000)  data  records  and  processes  characterized 
by  sharp  spectral  peaks  [10]. 
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Abstract 

The  stationarity  and  local  Gaussianity  of  ambient 
shipping  noise  recorded  during  an  experiment  conducted 
in  the  San  Diego  port  area  is  investigated.  First 
through  fourth  order  moments  are  used  to  identify  time 
periods  of  nonstationarity  in  the  noise.  Comparison  of 
the  shipping  data  with  colored  Gaussian  noise  indicates 
that  the  third  order  moments  deviate  from  Gaussianity 
more  than  the  fourth  order  moments.  The  local 
Gaussianity  is  quantified  using  the  Kolmogorov- 
Smimov  test.  While  the  shipping  noise  at  the  deeper 
depth  appears  somewhat  nonGaussian  during  certain 
time  periods,  the  shallower  depth  data  appears 
Gaussian. 

1.  Introduction 

In  an  effort  to  detect  quieter  signals  in  noisier 
environments,  alternative  detection  algorithms  using 
higher  order  statistics  have  been  proposed  by  various 
researchers  and  have  shown  promise  in  simulations  for 
a  variety  of  scenarios.  Many  of  these  detectors  exploit 
the  difference  between  the  higher  order  statistics  of  the 
signal  and  the  noise,  which  is  ideally  stationary  and 
Gaussian.  However,  in  shallow-water  environments 
complicated  by  factors  such  as  heavy  shipping,  surf 
noise,  and  multipath  signal  distortion,  detection 
algorithms  are  generally  not  optimal  and  the 
assumption  that  ambient  noise  is  Gaussian,  or  even 
stationary,  may  not  hold. 

Computer  simulations  and  theoretical  developments 
have  shown  that  higher  order  moments  can  passively 
detect  transient  signals  in  Gaussian  noise  better  than 
the  ordinary  cross  correlation  detector.  In  studies  by 
loup  et  al.  [1]  and  Pflug  et  al.  [2],  [3],  correlations  are 
calculated  using  information  from  multiple  sensors, 
with  the  number  of  sensors  required  being  equal  to  the 
order  of  correlation.  In  this  situation,  the  second  and 
higher  order  moments  of  the  noise  have  no  effect  on 
detector  performance  if  the  noise  is  uncorrelated. 
However,  if  correlations  are  formed  by  repeating 
information  from  only  one  sensor,  which  is  sometimes 
all  that  is  available  for  processing,  then  the  higher 


order  moments  of  the  noise  can  affect  detection.  To 
extend  this  work  on  higher  order  correlation  transient 
detectors  to  complicated  shallow  water  environments,  it 
is  important  to  have  a  realistic  estimate  of  the  higher 
order  moments  of  ambient  noise  and  their  stationarity. 

In  this  work,  measured  shallow  water  ambient  noise 
due  primarily  to  shipping  is  investigated  with  the  goal 
of  determining  whether  stationarity  and  Gaussianity 
assumptions  for  transient  detectors  are  appropriate,  and 
if  so,  for  what  time  periods.  While  it  is  generally 
accepted  that  ambient  noise  due  to  nearby  shipping  is 
nonGaussian,  only  a  few  attempts  have  been  made  to 
explore  the  nature  of  the  nonGaussianity  [4]-[7].  In  [4], 
Brockett  et  al.  examine  the  third  order  statistics  of 
noise  dominated  by  distant  shipping  or  by  one  nearby 
ship.  In  [5],  Hinich  et  al.  show  that  the  towing 
platform  in  an  experiment  has  strong  bispectral 
components.  Richardson  and  Hodgkiss  [6]  use  the 
bicoherence  to  determine  that  a  recorded  deep-water  time 
series  is  nonGaussian.  Only  Dalle  Molle  and  Hinich 
[7]  consider  the  fourth  order  statistic,  showing  that  the 
noise  generated  by  two  ships  approximately  460m 
from  a  sonobuoy  is  not  significantly  different  from 
Gaussian  noise.  However,  none  of  these  studies 
investigates  the  statistics  of  ambient  noise  generated  by 
a  multitude  of  nearby  ships,  such  as  would  occur  in  a 
port  area. 

2.  SWellEX-3  Experiment 

Ambient  noise  recorded  during  the  SWellEX-3 
experiment  is  used  to  investigate  the  first,  second,  and 
higher  order  moments  of  ambient  noise  due  to  shipping 
in  a  moderately  busy  port  area.  The  data  were  taken 
near  the  port  of  San  Diego,  California,  in  July- August 
1994  [8].  Ambient  noise  measurements  were  recorded 
on  a  vertical  64-element  array  with  2  m  spacing, 
located  in  water  approximately  200  m  deep.  Two 
channels  of  data  were  chosen  for  analysis,  2  and  43, 
with  respective  depths  of  192  m  and  116  m. 

The  data  are  sampled  at  1500  samples/second.  Only 
3-minute  data  segments  have  been  used  for  analysis  so 
far.  The  data  are  calibrated  and  mooring  platform  self- 
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noise  is  reduced/removed.  Additionally,  a  high-pass 
Butterworth  filter  of  order  nine  with  a  cutoff  frequency 
of  15  Hz  was  applied  in  an  attempt  to  reduce  the  effects 
of  sensor  motion,  or  flow  noise,  that  appeared  in  the 
uppermost  phones. 

Ships  in  the  port  were  tracked  with  radar  during  the 
experiment  and  are  used  to  identify  times  of  low, 
moderate,  and  high  shipping  activity  for  analysis.  It 
should  be  noted  that  shipping  traffic  in  the  area  is 
always  significant,  and  the  terms  low,  moderate,  and 
high  are  only  relative  and  describe  the  number  of  ships 
in  the  general  area  and  to  some  degree  the  proximity  to 
the  array. 

3.  Higher  Order  Moment  Analysis 

Three  different  length  processing  windows  are  used  to 
calculate  the  changing  first  through  fourth  order 
moments  of  the  data,  defined  by 

At  N-i 

mp  =  —  InP(kAt) 

1  k=0 

where  n(t)  is  the  recorded  noise.  At  is  the  sampling 
interval,  T  is  the  window  duration,  and  p  is  the  order  of 
correlation.  A  99%  overlap  of  the  sliding  window 
corresponding  to  a  moment  sampling  rate  of  50 
samples/second  was  found  sufficient  to  prevent  aliasing 
in  the  time-variation  of  the  second,  third,  and  fourth 
order  moments.  However,  a  small  degree  of  aliasing 
still  exists  in  the  first  order  moment.  Calculations  of 
processing  window  length  versus  mean  moment  values 
indicate  that  the  moments  are  reasonably  stable  for  one- 
second  intervals. 

The  moments  for  channel  2  using  a  one-second 
processing  window  during  the  moderate  noise  period  are 
given  in  Fig.  1.  For  comparison,  simulated  stationary 
Gaussian  noise  with  the  same  standard  deviation  and 
approximate  color  as  the  data  segments  is  also  analyzed 
with  the  results  shown  in  Fig.  2.  The  moment  means 
and  standard  deviations  are  given  above  each  plot.  The 
colored  Gaussian  noise  is  high-pass  filtered  in  the  same 
manner  as  the  data.  As  expected,  the  filtering  has 
little  effect  on  the  moments  of  the  Gaussian  noise.  A 
similar  analysis  is  performed  for  the  remaining  data, 
and  is  summarized  by  the  moment  means  and  standard 
deviations  in  Tables  1,  2,  and  3  in  the  Appendix.  The 
three  sets  of  Gaussian  noise  in  each  table  have  the 
same  3-minute  standard  deviation  as  the  corresponding 
shipping  noise. 

From  visual  comparison  of  Figs.  1  and  2,  the 
shipping  data  appear  nonstationary  over  the  3-minute 
time  segment.  Theoretically,  the  moments  of  a 
stationary  process  are  constant  over  time.  The 
variability  seen  in  Fig.  2  is  a  result  of  using  finite, 
rather  than  infinite,  sums  in  the  moment  calculations. 
If  the  shipping  data  were  also  stationary,  they  should 
have  a  similar  variation  in  their  moments. 
Comparison  of  the  standard  deviations  of  the  four  data 


moments  and  the  stationary  Gaussian  noise  moments 
reveals  that  the  moments  of  the  shipping  noise  vary 
much  more  than  those  of  the  stationary  noise  for  this 
case.  To  some  extent,  these  differences  can  be  used  to 
quantify  the  degree  of  nonstationarity  present  in  the 
data. 


Fig.  1.  Moments  vs  time  for  the  channei 
2  moderate-noise  shipping  data. 


Even  in  the  presence  of  nonstationarities,  inferences 
about  the  local  Gaussianity  of  the  data  can  be  made. 
For  a  zero-mean  Gaussian  process,  infinite  sums  in  the 
moment  calculations  should  result  in  mj  =  m3  =  0  and 

0x4  =  3m2 .  The  use  of  finite  sums  in  the  calculations 
results  in  deviations  from  these  relationships.  For 
example,  the  mean  value  of  m4  for  the  Gaussian  noise 

shown  in  Fig.  2  shows  a  1.35%  difference  from  3m|. 
However,  the  corresponding  comparison  for  the 
shipping  data  shown  in  Fig.  1  reveals  a  -34.17% 
difference,  indicating  a  departure  from  Gaussianity, 
averaged  over  time,  due  to  other  than  finite  sums.  For 
the  data  analyzed,  this  is  by  far  the  largest  difference 
between  the  data  and  simulated  Gaussian  averages.  For 
the  low  and  high  level  of  noise  at  channel  2,  the 
differences  are  -1.18%  and  -3.02%.  At  channel  43,  the 

differences  between  3m2  and  m4  for  the  low  and 
moderate  levels  of  noise  are  -6.98%  and  -3.51%.  The 
various  sets  of  Gaussian  noise  show  differences 
between  1.35%  and  2.48%.  Except  for  the  moderate 
noise  at  channel  2,  the  means  of  the  fourth  moments  of 
the  shipping  data  match  those  for  simulated  Gaussian 
noise  closely  and  have  relationships  to  the  second 
moment  that  are  consistent  with  Gaussianity.  In 
contrast,  the  means  of  the  third  moments  of  the 
shipping  data  do  not  match  those  for  the  simulated 
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Gaussian  noise  except  for  the  high  shipping  noise  at 
channel  43,  which  differs  by  only  -7.34%.  The 
remaining  sets  of  data  have  third  moments  that  differ 
from  the  simulated  Gaussian  noise  by  at  least  80.95%. 

Although  the  values  for  m3  in  Tables  1  and  2  appear 
large,  they  are  small  when  compared  to  the  magnitude 
of  the  cube  of  the  data. 


Gaussian  noise. 

4.  K-S  Test  for  Gaussianity 

Although  the  shipping  data  has  periods  of 
nonstationary,  the  Kolmogorov-Smimov  (K-S)  test  can 
be  used  to  assess  the  local  Gaussianity  of  the  data. 
Using  a  1-second  sliding  window  with  90%  overlap, 
the  K-S  test  was  applied  to  both  the  shipping  data  and 
the  simulated  stationary  Gaussian  data.  Values  of  the 
K-S  statistic  over  the  three-minute  moderate  noise  time 
period  for  channels  2  and  43  are  shown  in  Figs.  3  and 
4.  The  horizontal  lines  at  0.035  represent  the  level 
above  which  the  K-S  statistic  is  different  from  a 
theoretical  Gaussian  distribution  at  the  5%  significance 
level.  The  Gaussian  assumption  is  rejected  18.38%  of 
the  time  for  channel  2.  In  contrast,  Fig.  4  shows  that 
the  Gaussian  assumption  is  rejected  only  0.0057%  of 
the  time  for  channel  43.  The  simulated  Gaussian  noise 
is  never  rejected  as  Gaussian  at  the  1%  level  and  is 
rejected  less  than  0.0458%  of  the  time  at  the  5%  level. 
The  average  K-S  statistics  for  all  the  Gaussian  noise 
sets  range  from  0.0163  to  0.0175.  The  average  K-S 
statistics  for  the  shipping  data  are  also  within  this 
range,  except  for  the  noise  at  channel  2  during  the 
moderate  noise  period.  However,  channel  2  during  the 
moderate  and  high  noise  periods  exhibits  local  peaks 
suggesting  local  nonGaussianity. 
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Fig.  3.  K-S  test  for  channel  2  moderate 
level  shipping  noise. 
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Fig.  4.  K-S  test  for  channel  43  moderate 
level  shipping  noise. 

5.  Conclusions 

The  higher  order  statistics  of  ambient  noise  due 
primarily  to  nearby  ship  traffic  in  a  port  area  are 
analyzed.  The  analysis  includes  two  channels  of  data 
from  a  vertical  array  for  three  different  periods  of  noise. 
Examination  of  the  first  through  fourth  order  moments 
over  time  reveals  apparent  nonstationarities  in  the 
shipping  data.  Comparison  with  moments  of 
stationary  colored  Gaussian  noise  supports  this 
conclusion.  While  the  third  order  moments  of  the 
shipping  data  differ  somewhat  from  that  of  the 
Gaussian  noise,  the  fourth  order  moments  differ  much 
less.  The  Kolmogorov-Smimov  test  indicates  that  the 
noise  at  the  deeper  hydrophone  appears  to  have  periods 
of  local  nonGaussianity  during  the  moderate  and  high 
noise  segments,  while  the  noise  at  the  shallower 
hydrophone,  and  at  both  hydrophones  during  the  low 
noise  segment,  appears  relatively  Gaussian. 
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Noise  Type 

mi 

ra2 

m3 

m4 

Low  Gaussian  Noise 

0.34 

308.60 

l.lOelO 

5.06e8 

7.98el2 

8.33el3 

3.54e20 

3.35el9 

Moderate  Gaussian  Noise 

0.21 

-943.20 

5.16el0 

3.49e9 

4.23el3 

1.05el5 

7.88e21 

1.14e21 

High  Gaussian  Noise 

0.26 

760.90 

4.85el0 

2.85e9 

3.73el3 

9.00el4 

6.93e21 

8.54e20 

Low  Shipping  Noise 

-7.40 

670.80 

l.llelO 

1.69e9 

1.52el2 

8.83el3 

3.74e20 

1.21e20 

Moderate  Shipping  Noise 

-3.90 

955.70 

5.18elO 

l.OlelO 

7.80el3 

2.06el5 

1.08e22 

3.98e21 

High  Shipping  Noise 

-1.50 

1028.00 

4.87el0 

8.39e9 

-1.13el4 

9.38el4 

7.33e21 

2.59e21 

Table  1.  Channel  2  moment  mean  and  standard  deviations  using  a  1-second 
processing  window. 


Noise  Type 

mi 

m2 

m3 

m4 

Moderate  Gaussian  Noise 

0.45 

1007.00 

5.91el0 

3.62e9 

7.27el3 

1.13el5 

1.03e22 

1.30e21 

High  Gaussian  Noise 

0.62 

964.80 

6.31el0 

3.70e9 

7.22el3 

1.37el5 

1.17e22 

1.44e21 

Moderate  Shipping  Noise 

-6.50 

1407.00 

5.96el0 

1.34el0 

1.61el4 

1.29el5 

1.14e22 

4.88e21 

High  Shipping  Noise 

-4.30 

1441.00 

6.37el0 

1.03el0 

7.75el3 

1.45el5 

1.26e22 

4.12e21 

processing  window. 


ng  a  1 -second 


Channel  2 

Channel  43  1 

Noise  Type 

Average  K-S 
Statistic 

1%  Level 

5%  Level 

Average  K-S 
Statistic 

1%  Level 

5  %  Level 

Low  Shipping  Noise 

0.0163 

0.000 

0.000 

Moderate  Shipping  Noise 

0.0280 

3.347 

18.28 

0.0171 

0.000 

0.0057 

High  Shipping  Noise 

0.0171 

0.0011 

0.3777 

0.0177 

0.000 

0.0343 

statistic  is  above  the  1%  and  5%  significance  levels. 
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Abstract 

A  method  of  constrained  adaptive  beamforming 
employing  a  self-focussing  technique  for  an  uncalibrated 
array  is  described  arui  experimental  results  are 
presented.  Constrained  adaptive  beamforming  was 
employed  to  assess  the  performance  of  adaptive  nulling  to 
suppress  non-gaussian  atmospheric  noise  and  Doppler- 
spread  ionospheric  clutter  received  during  bistatic  radar 
experiments.  A  self-focussing  method  based  on  principal 
component  analysis  of  the  received  data  was  devised  and 
used  to  estimate  the  induced  steering  vectors  of  sources 
of  interest.  The  procedure  is  described  and  results  of 
subspace  projection  nulling  of  the  unwanted  noise  and 
interference  to  enhance  signal-to-noise  ratio  are 
presented. 

Introduction 

A  method  of  constrained  adaptive  beamforming 
employing  a  self-focussing  technique  was  used  with  an 
uncalibrated  array.  The  receiving  antenna  consisted  of  a 
thinned  planar  array  of  96  high  frequency  vertical 
elements  deployed  randomly  over  a  3-kilometer  aperture 
to  produce  a  narrow  pencil  beam,  but  with  elevated 
sidelobes.  Constrained  adaptive  beamforming  was 
employed  to  assess  the  performance  of  adaptive  nulling 
to  suppress  non-gaussian  atmospheric  noise  and  Doppler- 
spread  ionospheric  clutter  received  during  bistatic  radar 
experiments.  During  the  experiments  the  random  array 
had  not  been  calibrated  for  receiver  phase  and  amplitude 
differences,  or  cabling  differences,  although  the  surveyed 
positions  of  the  elements  were  known.  A  self-focussing 
method  based  on  principal  component  analysis  of  the 

1  This  work  was  performed  as  part  of  the  sponsored  research 
program  of  The  MITRE  Corporation  while  the  author  was 
affiliated  with  Gemini  Industries,  Inc.  ^  The  author  is  currently 
affiliated  with  MIT  Lincoln  Laboratory,  Lexington,  MA. 


received  data  was  devised  and  used  to  estimate  the 
induced  steering  vectors  of  sources  of  interest.  The 
procedure  will  be  described  and  results  of  subspace 
projection  nulling  of  the  unwanted  noise  and  interference 
to  enhance  signal-to-noise  ratio  are  presented  below. 

The  Experiment 

An  aerial  view  of  the  receiver  site  taken  during 
construction  is  shown  in  figure  1,  indicating  the  cable 
runs  to  the  96  elements  randomly  deployed  over  the 
essentially  planar  aperture.  Figure  2  shows  a  schematic 
diagram  of  the  adaptive  processing  applied  to  the 
elements  of  the  planar  antenna  array.  The  principal 
component  inverse  version  of  the  generalized  sidelobe 
canceller,  described  by  Kirsteins  and  Tufts,  was  used. 
This  was  applied  after  range-Doppler  processing  the 
outputs  of  the  antenna  array  receiver  elements,  which 
formed  the  receiving  station  of  a  bistatic  over-the  horizon 
radar  transmitting  a  linear  fin  waveform.  When  the 
analysis  was  performed,  the  array  data  had  not  been 
calibrated  to  compensate  for  receiver  phase  and 
amplitude  differences,  cable  length  differences,  and  siting 
errors.  Therefore,  instead  of  computing  the  beamsteering 
vectors  and  beamforming  constraint  vectors  from  the 
known  parameters  of  the  array  based  on  a  planar 
wavefront  propagation  assumption,  it  was  necessary  to 
estimate  these  vectors  from  the  principal  components 
induced  by  the  incident  field,  accomplished  by  applying 
an  eigenvector  beamforming  technique.  To  estimate  the 
beamsteering  weight  vector  that  points  the  array  in  the 
direction  of  a  received  signal  at  a  particular  Doppler 
frequency  in  a  range  cell  being  analyzed,  a  subset  of 
Doppler  samples  for  the  array,  local  to  the  chosen 
Doppler  frequency,  was  used  to  estimate  a  signal 
subspace  for  that  frequency.  The  principal  eigenvector 
belonging  to  the  largest  eigenvalue  was  used  to  estimate 
the  beamforming  vector  w^.  The  beamforming  constraint 
matrix  C  was  formed  from  beamforming  vectors 
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Figure  1.  Aerial  View  of  the  Receiving  Array  Site 
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Figure  2.  Principal  Component  Inverse  Adaptive  Sidelobe  Canceller 
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similarly  estimated  in  adjacent  range  cells.  This 
procedure  was  repeated  as  a  sliding  window  of  Doppler 
cells  was  scanned  across  the  Doppler  spectrum  of  the 
range  cell  to  detect  Doppler  shifted  signals  and  observe 
the  adaptive  spatial  cancellation  of  the  background  noise 
and  Doppler-spread  ionospheric  clutter.  Since  the 
illuminated  ground  clutter  spectra  are  easily  recognised, 
not  important  to  the  detection  of  Doppler-shifted  signals 
and  an  unecessary  burden  for  the  adaptive  spatial 
processor,  the  Doppler  cells  corresponding  to  the  ground 
clutter  were  removed  before  forming  the  data  matrix  to 
be  processed  and  analyzed.  The  samples  to  be  excised  in 
each  range  cell  were  determined  by  analysis  of  the 
principal  components  of  the  unexcised  data,  easily 
revealing  the  spectrum  of  the  main  ground  clutter 
component. 

Adaptive  Processing  Results 

An  example  of  the  results  of  applying  this  method  of 
adaptive  spatial  processing  to  the  data  is  presented  next. 


We  applied  the  self-focussing  technique  to  range  cell 
166,  and  a  few  adjacent  range  cells,  choosing  a  Doppler 
band  near  2  Hz,  to  estimate  the  beamsteering  weight 
vector  and  linear  constraint  matrix.  The  estimated 
beamforming  constraints  were  then  used  in  an  adaptive 
beamformer  applied  to  all  of  the  range  cells  in  the 
coherent  processing  interval.  Figure  3  shows  the 
unadapted  and  spatially  adapted  Doppler  spectra  for 
range  cell  166  with  the  beamsteering  weight  vector  and 
linear  constraint  matrix  estimated  from  a  band  of  Doppler 
cells  centered  near  2  Hz.  Adaptive  results  are  shown  for 
64  DOF  and  87  DOF.  Figure  4  displays  the  unadapted 
range-Doppler  map  for  128  range  cells  for  this  example 
(129-256)  and  figure  5  displays  the  range-Doppler  map 
after  spatial  adaptation  using  87  DOF.  For  this  example 
the  increase  in  SINK  was  measured  as  23.4  dB  for  64 
DOF  and  30.3  dB  for  87  DOF.  Other  applications  of  this 
technique  to  the  experimental  radar  data  produced  similar 
results.  It  is  evident  from  the  example  that  substantial 
suppression  of  the  clutter  has  been  obtained  without  loss 
of  signal,  facilitated  by  the  use  of  constraints. 
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Figure  4.  Unadapted  Range-Doppler  Map 
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Figure  5.  Spatially  Adapted  Range-Doppler  Map 
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Abstract 

In  this  papeTf  a  new  method  that  exploits  the  ideas  of 
independent  source  separation  in  the  context  of  Speech 
Enhancement  in  single  sensor  signals,  is  developed  and 
tested  in  various  situations.  The  channel  distortions 
of  the  two  sensor  case  are  artificially  reproduced  by 
suitable  linear  and  nonlinear  filters.  Separation  is  im¬ 
plemented  via  a  Lagrange  neural  network.  Results  on 
speech  signals  are  shown. 

1.  Introduction 

Recently  there  has  been  considerable  work  on  the  prob¬ 
lem  of  source  separation  (see  e.g  [7],  [8],  [10]).  In  its 
simplest  form  the  problem  is  given  a  linear  mixture  of 
signals  (sources),  to  separate  the  contribution  of  each 
of  the  sources  present  assuming  they  are  independent. 
Other  interesting  work  in  the  area  has  been  presented 
in  [3] ,  [6]  and  [9] .  Previous  research  has  focused  mainly 
on  multisensor  approaches  to  the  problem  where  differ¬ 
ent  mixtures  of  the  source  signals  arrive  at  each  one  of 
the  sensors.  Such  approaches  are  difficult  to  use  in 
practice,  because  of  the  increaised  complexity  imposed 
by  the  presence  of  an  array.  The  approach  of  our  work 
is  to  produce  estimates  of  the  signals  present  using  just 
one  sensor.  The  different  distortions  normally  suffered 
by  the  signals  in  the  channel  are  modelled  locally  by 
suitably  filtering  the  received  signal.  A  Lagrange  min¬ 
imisation  problem  is  formed  to  be  solved  by  a  Lagrange 
programming  neural  network  ([11]).  The  results  of  the 
application  of  the  method  on  contaminated  speech  sig¬ 
nal  are  included. 

2.  The  Source  Separation  Problem 

Consider  two  independent  signals  a:i  and  propagat¬ 
ing  in  the  same  medium  and  two  sensors,  each  receiv¬ 
ing  a  different  mixture  of  the  two  signals,  i.e.  yi  = 


aii^i  +  ai2X2  and  y2  =  a2ia;i  +  0222^2- 

It  can  then  be  shown  that  the  initial  signals  can  be 
recovered  as  ([1]): 


Sj  =  bixi  =  Wnyi  +  u;i2j/2 

(1) 

S2  =  ^'2»2  =  •U^212/1  +  ^'222/2 

(2) 

where  61  and  62  are  constant  gains  and  the  Wij  de¬ 
pend  only  on  the  a,js.  This  recovery  may  be  performed 
provided  that  aiia22  ”  012^21  ^  0. 

Since  the  a^jS  are  of  course  not  known,  the  WijS 
must  be  estimated  through  some  kind  of  optimisation 
procedure.  The  two  signals  are  by  assumption  idepen- 
dent,  zero  mean  implying  that  their  odd  powered  cross 
moments  are  zero.  This  fact  can  be  exploited  for  this 
optimisation.  Examples  of  ways  to  estimate  these  mo¬ 
ments  are  given  in  [1]  and  [7] .  The  method  of  estima¬ 
tion  used  in  our  work  will  be  presented  later  on  in  this 
paper. 

A  typical  block  diagram  of  a  source  separation  ap¬ 
paratus,  is  given  in  figure  1.  The  first  part  of  the  circuit 
(  marked  as  ‘CHANNEL’  )  reproduces  the  distortions 
that  would  normally  be  suffered  by  the  signals  in  the 
channel.  The  second  part  (marked  as  ‘NEURAL  NET’) 
is  the  one  that  recovers  the  mixed  signals.  The  weights 
Wij  are  controlled  by  some  adaptive  mechanism,  spe¬ 
cific  to  each  method. 


CHANNEL  NEURAL  NET 


Figure  1:  Standard  source  separation  setup 
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3.  The  single  sensor  case 

The  modified  arrangement  for  the  new  method  is  de¬ 
picted  in  figure  2.  In  our  case  there  is  only  one  sig¬ 
nal  available,  namely  the  noise  contaminated  signal 
(marked  as  ^sensor’).  A  two  sensor  simulation  can  be 
made  in  such  a  manner  that  the  distortions  that  the  sig¬ 
nal  would  undergo  when  travelling  through  a  channel 
are  modelled  by  passing  it  through  two  different  filters 
(shown  in  figure  2  as  HI  and  H2).  Some  guidelines  for 
chosing  these  filters  are  given  later  in  this  paper.  This 
produces  two  pseudo-sensor  signals,  shown  as  “sensor 
1”  and  “sensor  2” .  These  two  signals  are  then  used  as 
substitutes  for  the  signals  from  the  two  sensors. 


In  this  study  i  and  j  are  restricted  so  that: 

For  reasons  of  simplicity  only  the  two  source  case  is 
considered. 

4.  Implementation  Issues 

The  received  signal  which  is  assumed  to  be  a  linear 
mixture  of  the  two  source  signals  is  passed  through 
two  separate  filters.  The  two  outputs  are  used  in  our 
setup  in  the  manner  of  a  standard  source  separation 
problem  ([5],  [7]).  These  filters  should  not  have  high 
stopband  attenuation  so  that  both  the  outputs  convey 
information  about  all  frequency  components  of  the  sig¬ 
nals.  Further  investigations  as  to  the  choice  of  these 
filters  are  currently  under  way. 

It  can  be  easily  seen  that  the  following  modification 
to  the  objective  function,  reduces  the  computational 
load  considerably: 


Figure  2:  Block  diagram  of  the  setup  used  for  the  new 
method 

The  adaptation  mechanism  is  further  assisted  by 
the  introduction  of  constraints.  A  constrained  optimi¬ 
sation  problem  is  set  up  and  its  solution  implemented 
through  the  use  of  Lagrange  Programming  Neural  Net¬ 
works.  This  type  of  neural  networks  are  based  on  the 
Lagrange  minimisation  theory.  They  were  chosen  be¬ 
cause  they  permit  the  introduction  of  constraints,  but 
exhibit  further  advantages  in  terms  of  speed  of  conver¬ 
gence,  ability  to  readapt  and  good  stability.  Details 
about  them  are  given  in  [11]  and  [4]. 

It  has  already  been  mentioned  that  odd  power  cross 
moments  of  the  outputs  must  be  zero,  and  the  function 
to  be  minimised  is  therefore  taken  to  be 

J  =  (3) 

Li 

subject  to  the  constraint  that  Si-\-S2  =  y  where  y  is 
the  received  signal.  This  gives  the  following  Lagrange 
function  to  be  minimised: 

^  +  ^(^1  +  S2  2/)  (4) 

Li 

The  update  equations  for  Wjj  and  A  can  be  obtained 
by  using  (1)  and  (refequ2)  and  differentiating  the  above 
expression.  A  steepest  descent  adaptation  is  then  per¬ 
formed. 


^  £'[sf 1  +  +  S2  -  y)  (5) 

»,i  / 

Possible  further  implications  of  this  modification 
are  currently  under  investigation. 

Several  alternative  methods  for  estimating  the  cross 
moments  of  the  signals  have  been  investigated.  Clearly, 
since  we  are  dealing  with  higher  order  moments,  a  large 
number  of  samples  must  be  used  for  reducing  the  vari¬ 
ance  of  the  estimation.  The  fact  however  that  the  sig¬ 
nals  can  not  be  assumed  stationary  poses  a  limit  on 
the  number  of  past  samples  that  can  be  meaningfully 
used  in  the  estimation.  For  these  reasons  the  following 
recursive  formula  was  used: 

(e[s'i4])^  =  (/ix  (i^[s’i4])^_^  +  (l-<?^)xsi,n«2,n  (6) 

where 

(^[si4])„ 

is  the  estimate  for  the  moment  at  time  n,  and  Sk^n 
is  the  value  of  signal  Sk  at  time  n  and  ^  is  a  forgetting 
factor.  Equation  (6)  provides  an  unbiased  estimate  for 
the  moments.  Clearly  it  produces  good  estimates  of  the 
value  of  the  moments,  since  a  large  number  of  samples 
is  involved.  Additionally,  with  a  suitable  choice  of  <^,  it 
can  quickly  respond  to  changes  in  the  statistics  of  the 
signal. 
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A  variable  gain  adaptation  was  used  to  give  better 
stability  and  eliminate  oscillations  of  the  weights  in  a 
dynamic  Lagrange  neural  network  realisation.  For  sta¬ 
tionary  environments  the  adaptation  gain  modification 
is  taken  to  follow  the  rule: 

^  =  /^0  7 - : — - — 7^  (7) 

[iteraUon  number)^ 

where  /?  is  a  positive  constant.  Typically  0  <  /?  <  2. 
This  update  method  is  used  in  current  literature  ([!]). 

It  gives  an  initial,  near  optimal  solution  quickly  and 
then  convergences  with  small  missadjustment. 

Solutions  for  non-station  ary  cases  are  currently  be¬ 
ing  explored. 

5.  Results 

Convergence  is  fast  and  due  to  the  variable  gain  there 
are  no  weight  oscillations  after  the  final  values  are  reached. 
Sample  convergence  curves  for  the  weights  of  the  neural 
network  can  be  seen  in  figure  3. 


Figure  3:  Sample  convergence  curve  for  the  weights 


The  tests  were  performed  on  single  sinusoid  plus 
white,  zero-mean,  gaussian  noise,  speech  plus  sinu¬ 
soid  and  speech  plus  white,  zero-mean,  gaussian  noise. 
Sample  results  for  speech  plus  white  noise,  can  be  seen 
in  figures  5  (the  original  and  the  contaminated  signals) 
and  6  (the  reconstructed  signals). 


Figure  4:  Improvement  in  SNR  after  processing  versus 
input  SNR  (both  measured  as  segmental  SNR) 


Figure  5:  Example  of  the  application  of  the  method: 
Speech  plus  White  Gaussian  Noise,  a:  original  signal, 
b:  contaminated  signal 

The  graphs  clearly  show  a  definite  improvement  of 
the  reproduction  of  the  different  signals  in  each  case. 
The  outputs  are  acoustically  close  to  their  original  ver¬ 
sions.  The  improvement  in  SNR  versus  input  SNR  is 
given  in  figure  4.  It  can  be  seen  that  the  proposed 
method  gives  good  results  in  very  adverse  conditions. 
Note  that  the  SNR  displayed  is  a  segmental  SNR. 

Tests  for  removing  sinusoidal  interference  from  speech 
were  performed.  For  an  input  SNR  of  -3.7  dB,  the  out¬ 
put  SNR  was  16.12  dB  for  a  fixed  frequency  of  the  sine 
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Figure  6:  Example  of  the  application  of  the  method: 
Speech  plus  White  Gaussian  Noise,  a:  reconstructed 
signal  ,b:  reconstructed  noise 


wave  ( improvement  19.81  dB  )  and  12.2  db  for  a  slowly 
varying  one  (improvement  15.9  db). 


6.  Conclusions 

A  new  method  to  enhance  signals,  bcised  on  source  sep¬ 
aration  techniques  is  presented.  The  initial  results  ob¬ 
tained  are  quite  promising.  Several  improvements  are 
possible  in  a  variety  of  directions,  for  example  in  using 
different  filters  and  different  objective  functions.  The 
method  is  potentially  useful  in  many  applications  to 
other  signal  processing  problems,  such  as  for  example 
Voice  Activity  Detection.  Research  is  currently  un¬ 
der  way  to  explore  the  fundamental  parameters  that 
influence  this  approach  in  a  decisive  manner  and  to 


determine  the  limits  of  its  applicability.  Further  devel¬ 
opment  of  this  work  is  reported  in  [2] . 
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Abstract 

We  propose  an  automatic  signal  segmentation  algo¬ 
rithm  for  piecewise  constant  signals,  which  is  based  on 
Hidden  Markov  Models  (HMM),  It  segments  the  ob¬ 
served  data  without  the  need  for  training  data  and  ini¬ 
tial  conditions.  One  of  the  problems  for  automatic  seg¬ 
mentation  using  HMM  models  is  the  determination  of 
their  number  of  states.  In  this  paper,  the  number  of 
states  is  estimated  according  to  a  maximum  a  poste¬ 
riori  (MAP)  criterion.  The  proposed  algorithm  is  it¬ 
erative.  Its  initial  conditions  are  chosen  by  a  Tree- 
Structure  technique,  which  is  completely  data  driven. 
The  segmentation  is  further  improved  by  the  multiscale 
technique.  The  performance  is  evaluated  by  computer 
simulations. 


1.  Introduction 

Signal  segmentation  is  an  important  problem  that 
occurs  in  many  applications  including  speech  recogni¬ 
tion,  biomedical  signal  processing,  and  pattern  anal¬ 
ysis.  The  commonly  used  maximum  likelihood  seg¬ 
mentation  tends  to  have  poor  performance,  since  it  ig¬ 
nores  the  temporal  correlation  among  the  samples.  To 
include  temporal  correlation,  we  use  Hidden  Markov 
Models  (HMM^s)  [6].  An  optimal  segmentation  by 
HMM’s  can  be  achieved  by  the  well  known  Viterbi  al¬ 
gorithm  [6].  However,  this  algorithm  assumes  that  the 
number  of  states  is  known  and  requires  sufficient  data 
for  training  the  estimators  to  achieve  good  results  [6]. 
Therefore,  it  is  not  quite  practical  when  it  is  applied 
to  data  for  which  such  information  is  unavailable. 

In  this  paper,  we  propose  a  novel  algorithm  that  can 
circumvent  these  problems.  We  analyze  piecewise  con¬ 
stant  signals  whose  levels  (states)  and  number  of  states 

*This  work  wais  supported  by  the  National  Science  Founda¬ 
tion  under  Award  No.  MIP-9506743. 


are  unknown.  The  number  of  states  and  the  best  seg¬ 
mentation  are  determined  by  a  maximum  a  posteriori 
(MAP)  criterion.  The  proposed  algorithm  has  modest 
computational  requirements  even  when  it  is  extended 
to  two  dimensional  data. 

2.  Problem  Formulation 

Let  =  [j/i2/2  •  •  2/iv]  be  an  observed  data  vector 
of  N  samples  comprised  of  a  signal  embedded  in  ad¬ 
ditive  noise.  Let  =  [x\X2  •  *  be  the  unobserv¬ 
able  vector  of  signal  states  which  is  a  realization  of 
an  m-state  HMM  process,  i.e.  Xi  G  {1,2, -—jm}  for 
i  =  1, 2,  •  •  • ,  i\r.  Also,  suppose  that  the  observed  data 
can  be  modeled  by 

yi=9{xi)  +  Wi,  1=1,2,  •  (1) 

where  g{xi)  is  a  function  that  maps  the  underly¬ 
ing  state  Xi  to  a  constant  ^a...  The  vector  = 
[wiW2  ^  •  -  wn]  represents  noise,  and  its  elements  are 
independently  distributed  Gaussian  random  variables 

with  zero  mean  and  unknown  variance  al . .  We  model 

*1 

the  underlying  state  vector  x  as  a  first  order  HMM,  so 
the  probability  mass  function  of  the  vector  x  is  given 

by 

N 

p(x)  =  p(a:i)  JJp(a;.>.-i)  (2) 

»=2 

where  p[xi)  denotes  the  state  probability  of  the  first 
sample  and  p{xi\xi^i)  is  the  state  transition  proba¬ 
bility.  The  density  of  y,  given  the  underlying  hidden 
states  X,  is 

/(y|x)  = 

»=1 
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The  number  of  states  m,  the  parameters  associated 
with  each  state,  state  transi¬ 
tion  probability  are  unknown. 

The  problem  is  to  determine  the  number  of  states 
m,  estimate  all  the  unknown  parameters,  and  label  the 
observed  data  with  one  of  the  m  states. 

3.  The  Segmentation  Approach 

For  a  given  number  of  states  m  =  A:,  we  would  like 
to  estimate  the  underlying  state  vector  x* ,  where  Xi  E 
{1,2,  •  •  •,  Ar}.  From  Bayes’  theorem  we  can  obtain  the 
posterior  probability  of  the  state  vector  x*,  that  is, 

(3) 

Since  we  would  like  to  adopt  as  an  estimate  the  most 
probable  value  of  Xjt  given  the  data  y,  we  write 

Xjt  =  argmax{/(y|xfc)/(xj:)}.  (4) 

Xfe 

Note  that  /(y)  is  dropped  from  (4)  because  it  is  not  a 
function  of  Xjb. 

To  develop  an  efficient  algorithm  that  searches  for 
the  optimal  solution,  we  iteratively  optimize  individu¬ 
ally  the  Xiy  i  =  1, 2,  •  •  • ,  iV,  according  to 


the  one  proposed  in  [2].  We  refer  to  it  as  Multi-scale 
HMM  (MS-HMM).  It  is  composed  of  a  series  of  seg¬ 
mentations  progressing  from  coarse  to  fine  scale.  This 
is  implemented  as  follows. 

Let  the  observed  data  and  the  underlying  labels 
at  scale  s,  s  =  0, 1, 2,  •  •  •, Smaa:,  be  denoted  as  y<a> 
and  x<5>,  respectively.  The  initial  label  sequence  at 
scale  s,  x<5>  is  obtained  from  the  estimated  sequence 
x<5+i> .  The  number  of  data  at  scale  s  -h  1  is  only  half 
of  the  number  at  scale  s.  For  example,  y<o>  =  y  and 
x<o>  =  X  are  the  observed  data  and  their  labels  at  the 
finest  (original)  scale,  respectively.  Each  sample  at  the 
first  scale,  y<i>  and  x<i>,  corresponds  to  two  points 
in  the  original  scale  s  =  0.  For  the  observed  data  we 
use  J/,<1>  =  ^(y2.-i<0>  +  y2i<0>),  and  for  the  labels 

■i:2i-i<o>  =  a:2i<o>  =  a;,<i>,  for  i  =  .  Sim- 

ilarly,  each  sample  at  scale  s  =  2  corresponds  to  two 
samples  at  scale  s  =  1,  and  so  on.  Note  that,  at  coarser 
scales,  the  noise  in  the  data  y<a>  is  decreased  due  to 
averaging. 

4.  The  MAP  Solution  for  Number  of 
States 

In  general,  the  number  of  states  is  also  unknown. 
Our  objective  now  is  to  obtain  a  criterion  for  choosing 
this  number.  From  Bayes’  theorem  we  have 


Xi=arg  max  {/(y|xj;)p(xj;)}.  (5) 

When  we  eliminate  the  terms  which  are  not  functions 
of  Xi,  (5)  simplifies  to 


£,■  =  arg  max  {/(y,|x,)p(x,|x<_i)p(a;i+i|xj}. 

2, 


This  is  iteratively  solved  according  to 


(6) 


=  arg  max  {/(i/,|x,)p(x,|x,^i+^^)p(x^^\|x,)} 


(7) 

where  j  denotes  iteration,  and  p{xi\x\i^^^)  and 

p(4\i  ar,)  are  the  estimates  of  the  transition  probabil¬ 
ities  from  to  Xi  and  Xi  to  respectively.  The 
optimization  of  (7)  can  be  implemented,  for  example  by 
the  Iterated  Conditional  Modes  (ICM)  algorithm  [1]. 
Since  this  is  an  iterative  techniques,  the  initial  condi¬ 
tions  play  an  extremely  important  role,  and  therefore 
they  need  to  be  handled  with  great  care  [4],  [5].  In  our 
approach  we  choose  them  by  a  recently  developed  Tree 
Structure  (TS)  scheme  [4]. 

In  (7),  the  labeling  of  x,-  depends  only  on  the 
and  XiJ^i.  At  low  signal-to-noise-ratios  (SNR’s),  the 
initial  states  may  contain  many  mis-labeled  data  sam¬ 
ples,  which  could  lead  to  poor  results.  To  overcome  this 
shortcoming,  we  use  a  multi-scale  technique  similar  to 


(8) 

where  /(y|xfc)  is  the  likelihood  function  given  the  hid¬ 
den  states  and  the  number  of  states,  p(xib)  is  the  prob¬ 
ability  mass  function  of  x  given  k,  and  p{k)  is  the  a 
priori  probability  mass  function  of  the  model  with  k 
states.  If  we  let  p{k)  be  uniform,  the  MAP  solution  of 
(8)  becomes 

(Xj)  =  argmax{/(y|xfc)p(xi)}-  (9) 

Given  the  number  of  states  k,  we  can  find  the  under¬ 
lying  labels  Xk  for  A:  =  1, 2,  ■  •  • ,  kmax  by  the  proposed 
algorithm.  Once  we  obtain  the  Xjt’s  for  various  k’s,  the 
number  of  states  is  selected  according  to 

k  =  argmax{/(y|x)fc)p(xjfc)}  (10) 

k 

=  argmin{-ln/(ylxjb) -lnp(xjt)}.  (11) 

k 

To  determine  ln/(y|xjfc)  we  use 

/(y|xjk)=  /  f{y\ik,Ok)f{0k)d9k  (12) 

Jek 

where  6k  =  /^2)  ^2  >  ‘  parame- 

ter  vector  associated  with  all  the  states.  If  we  Taylor 
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expand  ln/(y|xi.)  around  the  maximum  likelihood  es¬ 
timate  of  0jb,  fijfe,  we  obtain 

ln/(ylxt)  ~  ln/(y|xt,  ©It)  -  -  9kY'Hk{6k  -  9k) 

(13) 

where  Wt  is  the  Hessian  of  —  ln/(y|xfc, 0t)  evaluated 
at  9k-  By  plugging  (13)  into  (12),  we  obtain 

ln/(y|xjfc)  ~  In /(y|xt,  ej,)  +  i  In  |W]t|.  (14) 


We  continue  by  setting  fc  =  2,  and  using  /ii  +  e  and 
/ii  —  e,  where  e  is  some  small  positive  number,  as  initial 
conditions  to  estimate  X2<2>  according  to 

Xi<2>  =  I  if  d(y«2>,/i/)  <  d(yi<2>,/t/'),  1  I', 

where  d(’)  is  the  Euclidean  distance.  Then  we  update 

Ai.  h,  ^1,  ^2  by 

Ml  =  —y2 1  (20) 

nj 


Using  a  similar  approach,  we  can  show  that 

Inp(xi)  ~  Inp(xil^t)  +  ^  In  \H'k\  (15) 

where  is  the  parameter  vector  of  the  state  tran¬ 
sition  matrix  for  k  states,  and  Tf'*  is  the  Hessian  of 
—  lnp(xib|<^jt)  evaluated  at 

Based  on  (11),  (14),  (15),  and  after  some  algebra, 
the  MAP  criterion  becomes 

kMAP  =  arg  min{Fs,(l:)  +  Fx(fc)}  (16) 
k 

where 

*  1 

Fy(fc)  = -ln/(y|xib,ei)  +  1^9inn,-  (17) 

i=l  ^ 

and 

F,(fc)  =  -  lnp(xfc l^t)  +  £  ® 

*=lj=l 

(18) 

where  n,*  is  the  number  of  samples  that  are  in  the  f— th 
state,  and  n,j  is  the  number  of  jumps  from  the  2— th  to 
th  state. 

This  criterion  is  a  penalized  maximum  likelihood 
with  a  simple  interpretation.  Fy(-)  has  two  terms, 
one  corresponding  to  the  fitting  error  of  the  applied 
model  and  the  other  to  the  penalty  for  overparameter¬ 
ization.  Fa:(*)  on  the  other  hand,  reflects  the  temporal 
constraints  imposed  by  the  HMM  and  the  penalty  for 
imposing  the  constraints. 


d-?  =  ^  ^(yi<2>  -  P/)^  Xi<2>  =  I  (21) 

n/ 


where  /  =  1,2,  and  n/  is  the  number  of  samples  in 
the  state  /.  When  the  labeling  process  converges,  we 
estimate  the  transition  probability  p(xi<2>|xi_i<2>) 

by 


P(a?,<2>ki-1<2>) 


^37t<2>iri~l<3> 


(22) 


where  Xi^2>y^i--i<2>  €  {1,2},  na:i_i<2>  is  the  number 
of  samples  that  are  in  state  •Ct— 1^2^,  U’Ud  1<2> 

is  the  number  of  jumps  from  state  a:,--.i<2>  to  state 
ar,<2>.  As  initial  state  probabilities  p(xi<2>),  we  use 
the  uniform  distribution.  Now  we  are  ready  to  start 
with  the  iterative  process.  We  iteratively  label  the 
state  vector  xjfc<2>  using  (7)  and  update  all  the  pa¬ 
rameters  until  the  process  converges.  We  then  start 
the  next  scale  s  =  1,  and  as  its  initial  parameters  we 
use  the  final  estimates  from  the  previous  scale.  After 
we  finish  the  scale  s  =  1,  we  repeat  the  procedure  for 
5  =  0. 

Once  the  segmentation  with  fc  =  2  is  completed, 
we  set  Ar  =  3.  We  have  two  sets  of  initial  conditions 
=  Mi+e,  M^2^  =  Mi-^  ,  Ms^  =  M2}  and  {/if  ^  =  fii, 
/if  ^  =  /i2  +  e  ,  /if  ^  =  /i2  —  t}-  We  then  apply  the  same 
procedure  to  each  of  the  initial  conditions,  evaluate 
the  criterion  (16)  based  on  the  segmentation  results 
and  choose  the  one  with  smaller  criterion  value.  The 
results  of  the  last  step  are  used  as  initial  conditions  for 
fc  =  4.  We  continue  by  increasing  k  and  applying  the 
same  steps  until  k  =  kmax-  Finally,  we  choose  the  k 
that  minimizes  (16). 


6.  Simulation  Results 


5.  Implementation  of  the  Proposed  Cri¬ 
terion 

We  implement  the  procedure  by  using  three  different 
scales.  We  start  with  the  assumption  that  the  number 
of  segments  is  equal  to  one,  that  is  k  =  1,  and  initialize 
all  the  underlying  states  to  1  for  s  =  0, 1,2.  We  then 
evaluate  fix ,  which  is  simply  the  sample  mean  of  all  the 
data  samples,  and  d-J  =  ^ 


To  verify  the  performance  of  the  MS-HMM,  we  ap¬ 
plied  it  to  synthesized  and  real  data.  The  first  experi¬ 
ment  was  on  a  synthesized  data  record  with  500  sam¬ 
ples,  three  different  states,  and  eight  transitions.  The 
signal  intensities  of  these  states  were  100,  120,  and  140. 
The  SNR  is  defined  by  min{-^}  ,  where  A  is  the  in¬ 
tensity  difference  of  the  transitions.  The  SNR  varied 
between  1  and  5,  and  there  were  100  trials  for  each  dif¬ 
ferent  SNR.  Figure  2  represents  the  synthesized  noisy 
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data  for  SNR=2.  The  number  of  different  states  is 
equal  to  three,  whereas  the  maximum  number  of  pos** 
sible  states  is  equal  to  five.  The  segmentation  results 
are  shown  in  Table  1.  Figure  3  is  the  histogram  of 
the  detected  state  transitions  for  100  trials.  The  peaks 
of  the  histogram  are  at  the  correct  locations  of  state 
transitions. 

In  the  second  experiment  we  applied  the  criterion  to 
real  patch  clamping  data,  which  are  used  in  the  study 
of  ion  permeation  mechanisms  in  biological  membranes 

[3].  Figure  4  displays  the  patch  clamp  data  to  which 
we  applied  our  segmentation  algorithm.  The  result  is 
shown  in  Figure  5  with  8  states  detected. 


1  80 

160 

140 

120 

100 
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Figure  1.  A  realization  with  3  classes  and 
8  transitions  with  SNR=2. 
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2 

3 

4 
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0 

83 

17 

0 

0 

2 

0 

0 
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0 

0 

3 

0 

0 

100 

0 

0 

4 

0 

0 

100 

0 

0 

5 

0 

0 

100 

0 

0 

Table  1,  The  entries  represent  the  number 
of  times  k  states  were  detected  in  100  trials. 


Figure  2.  Histogram  of  the  detected  edges 
from  100  trials  with  SNR=2. 


Figure  3.  Real  data  with  unknown  number 
of  classes. 


Figure  4.  Estimated  signal  from  the  data 
in  Figure  3  with  8  states  detected. 
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Abstract 

Using  sub  space  techniques  we  present  simulated 
results  investigating  variations  in  the  geometry  of  the 
sensor  array  for  EEG  measurements.  We  have  shown  that 
the  performance  of  the  sub  space  techniques  degrades  as 
the  sources  are  brought  closer  to  the  array.  This 
degradation  can  be  counteracted  by  changing  the 
curvature  of  the  array.  An  optimum  array  curvature 
exists  which  exhibits  best  detection  performance  for  a 
given  angle  of  arrival.  We  also  present  preliminary 
results  applying  subspace  techniques  to  a  sample  of  real 
EEG  data. 


Introduction 

Recent  work  on  the  estimation  of  the  direction  of 
arrival  (DOA)  problem  uses  the  subspace  approach  to 
determine  the  angular  location  of  multiple  emitters,  for 
example[l].  These  approaches  include  MUSIC[2], 
MLM[3]  and  J&D[4],  where  these  methods  have  been 
traditionally  applied  to  the  fields  of  RADAR,  SONAR 
and  seismology.  The  application  area  of  these  techniques 
can  be  extended  to  other  fields  such  as  biomedical 
problems[5].  One  such  application  is  that  of  estimating 
the  position  of  electrocortical  generators  in  the  brain  from 
the  electroencephalogram  (EEG).  The  estimation  problem 
is  complicated  by  a  number  of  factors: 

•  The  geometry  of  the  array. 

•  The  source  distance  from  the  array  . 

•  Noise  present  in  the  system. 

The  work  described  here  investigates  the  application 
of  subspace  techniques  to  the  processing  of  signals  where 
two  of  the  above  factors  are  varied.  This  simulates  a 
simplified  environment  similar  to  that  in  which  EEG 
signals  are  measured.  Using  subspace  techniques  on  real 
EEG  data  the  aim  is  to  estimate  the  position  of  possible 
electrocortical  generators  in  the  brain. 

This  paper  consists  of  five  sections.  Section  one 
reviews  some  of  the  work  in  the  area  of  subspace 


techniques  for  solving  spectral  estimation  problems. 
Section  two  describes  the  area  of  application  to  both  EEG 
and  driven  EEG.  The  results  of  simulations  using 
subspace  algorithms,  and  a  discussion  of  the  limitations  of 
these  algorithms  under  the  conditions  outlined,  are 
presented  in  section  three.  Section  four  discusses  the 
results  of  the  application  of  subspace  techniques  to  the 
EEG  context.  The  final  section  offers  conclusions  and 
comments  on  possible  further  work  in  this  area. 

!•  Signal  Subspace  Methods 

This  section  briefly  reviews  signal  subspace  methods. 
These  methods  are  primarily  derived  from  the  covariance 
matrix  which  is  constructed  from  incoming  signal  data. 
The  covariance  matrix  can  be  broken  down  through  the 
use  of  appropriate  matrix  properties  and  eigen- 
decomposition  techniques  into  two  subspaces,  the  signal 
subspace  and  a  noise  subspace  [  1  ]-[5] 

Assuming  a  system  model  in  which  M  far-field  sources 
are  viewed  by  N  sensors  (N  >  M).  The  sensors  may  exist 
in  any  configuration,  for  example  a  linear,  circular  or 
curved  array.  Consider  the  system 

x  =  Vs  +  n,  (1) 

where 

X  =  [x(l),  x(2),...,  x(n)...,  x(N)]^  represents  the  signals  at 
the  N  sensors  at  any  instant; 

s  =  [s(l),s(2),...,s(m),...,s(M)]^  represents  the  plane 
wavefronts  from  the  M  sources; 

n  =  [n(l),  n(2),....,n(n),...,n(N)]^  represents  the  receiver 
noise  contributions  to  the  signals  at  the  N  sensors,  and  the 
(N  X  M)  matrix  V  represents  the  response  of  the  N 
sensors  in  the  M  signal  directions.  The  matrix  V  cannot 
be  specified  until  the  directions  to  the  sources  are  known, 
thus  eqn.  (1)  cannot  be  solved  directly. 

The  subspace  methods  use  the  covariance  matrix  of  the 
system  model  which  is  defined  as: 

C  =  E{xxH}=E{(Vs  +  n)(Vs  +  n)}H  (2) 

where  E  is  the  expectation  operator  and  H  is  the  hermitian 
operator.  If  the  sources  are  uncorrelated  with  the  receiver 
noise  then: 
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E{iisH}=E{snH}=0  (3) 

and  if  the  noise  is  white  Gaussian  with  variance 

C  =  VCsVH  +  Cn  =  VCsVH  (4) 

The  direction  finding  (DF)  problem  in  this  system  is 
the  identification  of  the  direction  vectors 

Vm^=[v(l,m),v(2,m),...,v(N,m)],  m=  (5) 

Given  that  all  the  possible  correlations  between  a  pair 
of  individual  sensor  signals  exist  in  C  it  is  possible 
through  the  use  of  eigen-decomposition  techniques  to 
decompose  the  complex  space  that  C  spans  into  two 
mutually  orthogonal  subspaces.  These  are  the  signal 
subspace  and  the  noise  subspace.  It  can  be  shown  that 
either  the  signal  or  the  noise  subspace  contain  all  the 
necessary  information  required  to  determine  the  number 
of  sources  and  the  direction  of  arrival  [2]. 

Using  the  hermitian  property  of  C  we  are  able  to 
transform  it  into  a  real  diagonal  matrix  A  using  a  unitary 
matrix  U  as  shown  below: 

U^CU  =  A  or  C  =  (6) 

where  the  columns  of  U=[Ui,  Ujs^]  are  the 

eigenvectors  of  C  and  A  holds  the  eigenvalues. 

A  =  diag[^,,^2’"-^N]  ^1-^2— ”-^N  O) 

The  transformation  can  be  written  as: 

N  N 

C=T  ^nUnuI;'  and  c'  =T 

n=l  "  (8) 

and  since  =  U‘*  (a  property  of  a  unitary  matrix),  UiHuj 
constitute  an  orthonormal  set. 

Assuming  that  there  are  more  sensors  than  unknown 
sources,  i.e.  M<N,  [2]  shows  that  there  must  be  (N-M) 
eigenvalues  equal  to  the  noise  variance  The 

corresponding  (N-M)  eigenvectors  form  the  noise 
subspace.  As  a  result  the  M  largest  eigenvectors  of  C  are 
the  M  orthonormal  vectors  which  form  a  subset  of  the 
entire  complex  vector  space.  This  space  is  known  as  the 
signal  subspace  and  it  contains  the  signal  vectors. 

The  subspace  approach  can  be  expressed  concisely  by: 


It  can  be  shown  [2]  that  V%j^=0  for  n=M+l,...,N.  By 
sweeping  the  direction  vector  v^(0)  through  all  possible 
values  of  0  and  over  all  noise  eigenvectors  we  derive  the 
MUSICp[2]  and  Johnson  and  DeGraaf  (J&Dj^)[4] 
direction  finding  functions.  Whereas  MLMj^[3]  is  derived 
by  sweeping  v^(0)  over  all  eigenvectors. 


Since  the  discovery  of  the  EEG  60  years  ago, 
innumerable  studies  have  investigated  the  relationships 
between  neural  phenomena,  the  performance  of  cognitive 
tasks,  and  associated  changes  in  the  EEG  which  are  called 
Event  Related  Potentials[6].  A  novel  extension  of 
traditional  methodology,  has  been  developed  by  SCAN, 
based  on  the  technique  of  Steady-State  Visually  Evoked 
Potentials  (SSVEP)  in  which  the  subject  is  exposed  to  a 
continuously  flickering  visual  driving  signal  whilst 
performing  cognitive  tasks[7].  The  signal  processing 
significance  of  the  visual  driving  signal  is  that  in  excess 
of  38%  of  all  sensory  input  pathways  to  the  brain’s  cortex 
are  linked  to  the  visual  pathways[8],  so  that  driving  the 
visual  pathways  presents  a  substantial  known  input 
driving  signal  to  the  cortex.  The  system  identification 
problems  which  are  intrinsic  to  most  EEG  signal  analysis 
work  are  therefore  ameliorated  to  some  extent. 

The  EEG  is  recorded  using  a  specially  designed  helmet 
with  64  sensors.  The  rigidity  of  the  helmet  guarantees  the 
relative  position  of  the  electrodes,  which  are  positioned 
according  to  the  International  10-20  system.  Additional 
electrodes  are  placed  at  sites  resulting  in  an  average  inter¬ 
electrode  spacing  of  approximately  2.5  cm[7]. 

By  measuring  the  spatial  distribution  of  EEG  activity 
under  well-defined,  stringent  test  conditions[7][8], 
estimation  of  the  positions  of  the  electrocortical 
generators  in  the  brain  is  equivalent  to  the  classical 
problem  of  estimating  the  location  of  multiple  emitters. 
The  estimation  problem  is  complicated  by: 

•  The  geometry  of  the  array.  Most  of  the  research  in  the 
area  of  direction  finding  using  spectral  estimation 
techniques  is  based  on  either  linear  or  circular  planar 
arrays.  This  assumption  may  not  be  valid  in  EEG 
measurements  where  the  sensors  are  placed  on  the 
surface  of  the  scalp. 

•  The  well  known  model-based  direction  finding 
techniques  assume  that  the  array  is  far  enough  from 
the  sources  to  ensure  planar  impinging  waves.  This 
may  not  be  true  for  the  EEG  measurements. 

•  Noise  present  in  the  EEG  usually  is  not  Gaussian,  but 
more  likely  closer  to  1/f  noise. 


3.  Simulation  Results 

This  section  describes  the  results  of  simulations  with 
the  subspace  algorithms,  and  a  discussion  of  the 
limitations  of  these  algorithms  under  the  conditions 
outlined  below: 

•  Varying  the  separation  between  sources  and  sensors  to 
investigate  the  effect  of  the  curvature  of  the  arriving 
waves. 

•  Varying  the  radius  of  curvature  of  the  array. 

The  following  simulations  were  based  on  1000  data 
samples,  S/N  of  20dB  and  8  sensor  elements.  Note  the 
three  subspace  algorithms  produced  similar  results,  thus 
the  figures  only  show  the  results  for  MUSIC,  Figures  1-3 
present  results  for  one  source,  whilst  figures  4-5  are  for 
two  sources. 


Angle  of  Arrival  (deg) 

Figure  1  DOA=  -30°,  linear  array,  x=distance  from  array. 


Angle  of  Arrival  (deg) 

Figure  2  x=5A.,  0  =  60°,  r=  radius  of  curved  array. 


Given  a  linear  array,  figure  1  shows  that  as  the  source 
is  brought  closer  to  the  array  the  performance  of  the 
subspace  algorithms  deteriorates.  This  is  expected  since 
the  curvature  of  the  impinging  wavefronts  increases.  To 
improve  the  performance  in  the  “close  source  case”  we 
investigated  varying  the  radius  of  curvature  of  the  array. 
Figure  2  shows  that  there  is  a  particular  radius  of 
curvature  (r^pt)  of  the  array  which  results  in  optimum 
performance  of  the  algorithms. 

Figure  3  shows  how  the  performance  of  the  subspace 
algorithms  varies  with  the  direction  of  the  source  when 
the  source  is  close  to  the  array.  An  interesting  observation 


is  that  the  larger  the  angle  of  arrival  the  lower  the  noise 
floor.  A  larger  angle  of  arrival  implies  that  the  distance 
between  the  source  and  the  furthest  sensor  is  greater  than 
when  the  source  has  a  smaller  angle  of  arrival.  This  extra 
distance  implies  the  waves  travel  further  thereby  better 
approximating  a  plane  wave. 


Angle  of  Arrival  (deg) 


Figure  3  x=5X,,  linear  array,  0  =  DOA. 


Angle  of  Arrival  (deg) 

Figure  4  DOA  0i  =  60°,  02  =  -30°,  Xj  =  Xj  =  x,  linear  array. 


Angle  of  Arrival  (deg) 

Figures  0i=6O°,  02= -30°,  Xi  =  X2  =5X,  curved  array. 

As  expected,  in  figure  4  we  see  similar  results  to  those 
obtained  in  figure  1.  As  the  sources  are  moved  closer  to 
the  array  the  performance  of  the  algorithms  degrades. 
Note  that  the  strength  of  the  detected  peaks  is  different. 
This  is  expected  because  the  performance  of  the 
algorithms  is  better  for  larger  angles,  as  can  be  seen  from 
figure  3.  Again,  varying  the  radius  of  curvature  of  the 
array,  results  in  the  best  performance  occurring  at  Fopt  as 
per  the  single  source  case.  Figure  5  shows  that  r„pt 
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depends  on  the  angle  of  arrival.  Given  that  both  sources 
are  5X  from  the  array  Fopt  -  5. SX  for  source  1  and  Fopt  = 
9.5X  for  source  2.  Note  peaks  of  equal  strength  can  be 
obtained  when  F„pt  =  6.7X  for  0i=  60°,  02=  -30°. 

4.  Analysis  of  Preliminary  EEG  Results 

In  this  section  we  present  the  preliminary  results  of 
applying  the  subspace  techniques  to  the  EEG  data.  The 
EEG  data  was  recorded  in  the  presence  of  visually 
applied  driving  signals  at  a  range  of  frequencies.  The 
particular  case  under  investigation  is  when  the  subject  is 
exposed  to  a  sinusoidal  visual  driving  signal  of  13  Hz. 
Seven  sensors  were  chosen  located  from  the  front  to  the 
back  of  the  head.  The  sensors  chosen  were  spaced  at 
approximately  35mm.  Assuming  an  average  wave 
velocity  in  the  cortex  of  7ms' ^  the  separation  becomes 
approximately  0.065  wavelengths.  The  EEG  signal  was 
filtered  to  remove  unwanted  components,  the  correlation 
matrix  formed  and  the  direction  functions  plotted. 


Angle  of  Arrival  (deg) 

Figure  6 

The  result  shown  in  figure  6  indicates  that  there  is  no 
localisation  of  the  sources,  ie.  no  distinct  angle  of  arrival 
is  identifiable.  This  result  is  promising  in  that  the  data 
used  was  obtained  using  a  visual  stimulus  of  13  Hz.  This 
stimulation  excites  the  parts  of  the  brain  which  are  not 
taking  part  in  any  other  activity,  hence  we  expect  several 
areas  of  the  brain  would  pose  as  possible  source 
locations. 

To  improve  the  estimate  of  the  location  of  the  sources 
the  following  factors  need  to  be  considered; 

The  curvature  of  the  detection  array  of  sensors  needs 
to  be  adjusted  to  obtain  Fopt-  In  this  particular  experiment 
it  wasn’t  possible  to  alter  the  curvature  of  the  detection 
array.  This  will  be  the  subject  of  a  subsequent  paper. 

The  result  also  shows  that  Music  gives  a  different 
outcome  in  comparison  to  J&D  and  MLM.  This 
observation  may  be  due  to  the  fact  that  the  MUSIC 
method  doesn’t  have  a  weighting  function,  see  equ.  10, 11 
and  12. 


Conclusion 

We  have  shown  that  the  performance  of  the  subspace 
techniques  degrades  as  the  sources  are  brought  closer  to 
the  array.  This  degradation  can  be  counteracted  by 
changing  the  curvature  of  the  array.  An  optimum  array 
curvature  exists  which  exhibits  best  detection 
performance  for  a  given  angle  of  arrival. 

Application  of  the  subspace  techniques  to  the  EEG 
context  seems  promising  as  a  method  for  the  location  of 
electrocortical  generators  in  the  brain.  Whilst  the  results 
obtained  here  are  encouraging,  there  is  scope  for  further 
work.  For  example  the  investigation  of  a  compensation 
filter  which  would  correct  the  non-optimum  array 
curvature.  This  may  need  to  be  implemented  using  an 
artificial  neural  network,  because  the  exact  source 
distance  will  be  different  for  different  cases. 
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Abstract 

A  two-step  incoherent  signal  subspace 
averaging  technique  is  applied  to  locate  the  sonobuoy 
in  the  presence  of  a  highly  coherent  environment 
generated  by  the  scattering  of  the  sonobuoy  signals 
from  the  aircraft  propellers.  The  proposed  technique  is 
based  on  the  assumption  that  accurate  modeling  of  the 
scattering  modulation  effects  of  the  propellers  is 
available.  This  information  gives  insights  into  the 
relative  contributions  of  the  direct  and  multipath 
components  to  the  signal  subspace.  The  first  step  of 
the  proposed  technique  amounts  to  modifying  the 
MUSIC  spectrum  by  projecting  a  weighted  sum  of 
steering  vectors  onto  the  noise  subspace.  The  second 
step  is  to  perform  incoherent  subspace  averaging  across 
the  sonobuoy  frequency  channels.  We  show  that 
significant  improvement  is  achieved  using  the  proposed 
technique  over  the  case  of  applying  noise  subspace 
eigenstructure  methods. 

1.  Introduction 

One  application  of  an  airborne  antenna  array 
is  to  receive  information  from  sonobuoys  as  well  as  to 
locate  their  positions.  The  aircraft  drops  sonobuoys  in 
the  water  and  starts  to  monitor  their  signal 
transmission.  As  the  aircraft  moves  along  its 
flight  path,  the  same  buoy  could  be  at  different 
positions  at  different  times  with  respect  to  the  aircraft, 
due  primarily  to  aircraft  motion  and  to  a  much  lesser 
extent,  movement  of  the  buoy  by  ocean  current.  The 
buoy  could  be  at  some  distance  from  the  aircraft  or 
directly  underneath  it.  Therefore,  the  incident  angle  of 
the  signal  on  the  aircraft  array  and  propellers  will  not 
be  constant.  One  of  the  primary  tasks  for  these  flying 
missions  is  to  locate  the  sonobuoy  with  reasonable 
accuracy. 

With  the  blades  of  the  aircraft  propellers  in 
continuous  rotation,  it  can  be  much  expected  that  the 
This  work  is  supported  by  ONR  grant  N00014-94-1-1052 


sonobuoy  signal  scattered  from  the  propellers  and 
arriving  at  the  array  to  be  a  modulated  version  of  the 
direct  path  signal.  Models  describing  distributed 
source  environment  [1],  and  specifically  the  propeller 
return  [2]  have  been  recently  described.  Also,  accurate 
scattering  calculations  can  be  performed  using  the 
FDTDC  [3].  Except  for  the  angle  of  arrival  of  the 
direct  path,  almost  all  parameters  defining  the 
modulation  effects  are  known  [1].  This  includes  the 
rotational  speed,  distances  of  the  blades  tips  and  roots 
from  the  center  of  rotation,  number  of  blades  and 
propellers,  and  the  range  of  the  propeller  from  the 
array. 

The  modulation  effects  of  the  propellers 
depend  on  several  parameters.  A  number  of  sidebands 
often  result  about  the  center  frequency  of  the  sonobuoy 
signal.  Depending  on  the  number  of  the  propeller 
blades  and  the  frequency  of  rotation,  one  or  several 
sidebands  of  the  propeller  reflection  fall  into  the 
information  bandwidth  of  the  direct  signal,  causing 
severe  multipath  degradation  effects  on  the  performance 
of  the  localization  and  nulling  techniques  of  the 
airborne  array  system.  Due  to  the  coherent  signal 
environment,  the  optimum  beamformers  (high 
resolution  DOA  spectra)  not  only  fail  to  form  nulls 
(peaks)  in  the  direction  of  the  signals  incident  on  the 
array,  but  also  it  tends  to  cancel  the  desired  (look 
direction)  signal  in  the  output.  This  failure  occurs  even 
with  the  decorrelation  effects  introduced  by  the  motion 
of  the  array  on  the  aircraft  [4,5]. 

Preprocessing  techniques  may  prove 
inefficient  for  the  underlying  problem.  Spatial  filtering 
methods  [6,7]  place  a  low  array  gain  over  the 
interference  spatial  sector  to  remove,  or  at  least 
attenuate  the  interference  outside  the  sector  of  interest. 
Subarray  averaging  methods  [8,9]  decorrelate  the 
coherent  arrivals  under  certain  conditions  which  relate 
the  number  of  sources  and  array  sensors.  Both  methods 
reduce  the  array  aperture  and  are  impractical  for  planar 
arrays  with  small  number  of  sensors.  Further,  the 
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decorrelation  methods  are  only  proper  for  point  sources 
and  do  not  work  using  models  of  distributed  sources. 

In  this  paper,  we  estimate  the  sonobuoy 
elevation  and  azimuth  positions  by  making  use  of  1) 
the  knowledge  of  the  propeller  spatial  coordinates, 
relative  to  the  array,  2)  the  availability  of  accurate 
modeling  of  the  multipath  signals.  While  the  former 
defines  the  spatial  sector  of  the  multipath  signals,  an 
accurate  model  of  the  propellers  provides  the  means  to 
obtain  the  relative  phase  and  power  of  the  multipath 
signal  relative  to  its  direct  path  in  each  of  the 
sonobuoy’s  frequency  channels. 

Due  to  the  narrowness  of  the  interference 
spatial  sector  or/and  the  strong  coherence  of  the 
mutipath  signals,  the  propellers  scattered  waveforms 
can  be  presented  by  rank  one  covariance  matrix,  i.e., 
their  source  representation  subspace  is  spanned  by  a 
single  vector  [10].  This  vector  can  be  selected  as  the 
directional  vector  corresponding  to  the  center  angle  of 
the  spatial  sector,  or  more  accurately,  can  be  chosen  as 
the  principle  eigenvector  of  the  covariance  matrix. 
Each  snap  shot  can,  therefore,  be  modeled  as  a 
weighted  sum  of  three  directional  vectors;  one 
corresponds  to  the  sonobuoy  direct  path  and  the  others 
to  the  two  propellers  on  the  aircraft.  This  paper 
assumes  knowledge  of  fixed  relative  weights,  and 
modifies  the  noise  subspace  based  eigenstruture 
methods  [11]  so  as  their  spectra  include  only  peaks 
whose  number  and  locations,  respectively,  equal  the 
number  and  positions  of  sonobuoys  in  the  field  of 
view. 


aircraft  propellers.  In  this  case,  the  covariance  matrix 
in  (1)  is  produced  by  replacing  the  integral  with  a  sum 
and  incorporating  several  directional  vectors  which 
uniformly  cover  the  interference  sectors. 

3,  Modified  Eigenstructures 


For  narrowband  signals,  the  multiple  signal 
classification  (MUSIC)  spectrum  is  given  by 

S(9,(p)  = - - 

s'^(9.(p)W'^s(9,(p)  (3) 

where  ‘H’  stands  for  hermition.  The  matrix  V  spans 
the  noise  subspace,  which  in  the  underlying  problem, 
is  of  dimension  Nx(N-L),  where  L  is  the  number  of 
sonobuoys.  In  the  proposed  technique,  we  use 
equation  (2)  to  gain  insights  into  the  m^e  up  of  the 
signal  subspace  through  proper  modeling  of  the  aircraft 
propeller  effects.  We  modify  the  above  equation  to 

1 


where 


CM9)yV"s^J0.(p)  (4) 


9)  =  9)  +  oeei+  Pe^  ^5) 

The  difference  between  (3)  and  (4)  is  that  in  (4),  the 
steering  vector  s  along  with  the  two  fixed  vectors  e,,  ej 
is  projected  onto  the  noise  subspace. 

For  broadband  signals,  the  above  projection  is 
averaged  over  the  entire  frequency  band  of  the 
sonobuoy  signal.  Assuming  M  channels,  then 


2  .The  Principle  Assumption 

The  covariance  matrix  of  the  multipath  signals  over 
the  sector  0,  ,  i=l,2  for  the  two  aircraft  propellers  can 
be  expressed  as 

/?,.  =  j  1](0. 9)a(9,  (p)a"(9,  <p)d9d(p 

®'.  .  .  (1) 
where  a(0,(p)  is  the  directional  vector  which  is  a 

function  of  both  the  elevation  and  the  azimuth,  and 
T](0,(p)  provides  the  distribution  of  the  energy  over 
0,  .  The  source  representation  subspace  of  the  propeller 
scatters  is  taken  as  the  principle  eigenvector  e  of  R. 
Therefore,  the  vector  spanning  the  subspace  of  the 
overall  covariance  matrix  of  the  coherent  direct  and 
mutipath  signals  takes  the  form 

e^=s(e,q>)  +  a€^  +  Pe:,  (2) 

The  weights  cx,  p  arc  complex  values  and  reflect  the 
propeller  amplitude  and  phase  changes  to  the  sonobuoy 
signal.  The  primary  assumption  in  this  paper  is  the 
apriori  knowledge  of  these  weights  from  existing 
distributed  sources  and  propeller  scattering  modeis[l-3]. 
In  the  simulation  section,  we  assume  equd 
distribution  of  energy  over  the  spatial  sectors  of  both 


(p)  =  -M - 

(6) 

(d>9:fi)  =  4  0. 9:  fi )  +  oc.e^  ( i)  +  (i) 

The  above  type  of  averaging  constitutes  the  second 
step  of  the  proposed  technique  and  is  similar  to 
incoherent  subspace  averaging  [10],  which  is  proposed 
for  increasing  the  SNR. 

4.  Simulations 

In  the  first  example,  the  simulation  performed 
consisted  of  two  groups  of  completely  correlated 
signals.  The  first  group  consists  of  a  desired  signal 
arriving  ai(6,<p)=  (15,15)  degrees  with  two  clusters  of 
multipath  signals  arriving  on  the  different  sides  of  the 
desired  signal.  The  two  clusters,  representing  the 
propeller  scattering  signals  are  centered  at  (56,  170)  and 
(49,10)  degrees.  Each  cluster  spans  A0  =  3,  A9  =  6 
degrees.  The  second  group  has  a  desired  signal  arriving 
at  (35,-60)  degrees  and  also  with  two  clusters  of 
propellers  multipath  signals  arriving  at  the  same 


292 


angles  and  spanning  the  same  spatial  sector  as  in  the 
first  group.  The  clusters  associated  with  each  desired 
signal  are  correlated  with  the  desired  signal  for  that 
group  only.  The  sonobuoy  signals  are  20  dB  higher 
than  the  uncorrelated  Gaussian  noise.  The  number  of 
data  snapshots  taken  to  generate  the  estimate  of  the 
noise  subspace  was  1024. 

The  comparison  of  the  conventional  two 
dimensional  MUSIC  algorithm  and  the  proposed 
technique  is  shown  in  Figure  1 .  The  proposed 
technique  resolves  the  direction  of  arrival  (DOA)  of  the 
desired  signal  of  each  group  whereas  the  MUSIC 
algorithm  completely  fails  to  resolve  the  DOA  of  any 
of  the  incoming  signals.  Figure  (1-a)  shows  the 
contour  plot  for  the  MUSIC  algorithm  applied  just  for 
the  first  group.  Figure  (1-b)  shows  the  contour  plot 
for  the  proposed  technique  where  the  source  location  is 
successfully  estimated  at  (15,15)  degrees.  Figure  (1-c) 
shows  the  contour  plot  for  the  MUSIC  algorithm 
applied  to  the  above  two  groups  occurring 
simultaneously.  Figure  (1-d)  shows  tha  the  proposed 
technique  correctly  resolve  the  DOAs  for  the  desired 
signals  from  each  group,  (15,15)  and  (35,-60). 

The  second  example  deals  with  a  broadband 
signal,  where  averaging  across  the  frequency  band  is 
performed  via  equation  (6).  The  wavefroms  incident  on 
the  array  consisted  of  one  group  of  coherent  signals, 
covering  half  of  the  total  normalized  bandwidth.  The 
desired  signal  arrives  at  (15,35)  degrees  with  two 
clusters  of  multipath  signals  arriving  on  its  different 
sides  as  in  the  first  example. 

The  comparison  of  the  incoherent  signal- 
subspace  (ISS)  of  the  two-dimensional  MUSIC 
algorithm  and  the  proposed  technique  is  shown  in 
Figure  2.  The  ISS  for  the  MUSIC  algorithm 
completely  fails  to  resolve  the  DOA  of  the  incoming 
signal,  as  shown  in  the  contour  plot  of  Figure  2  (a). 
The  contour  plot  of  Figure  2  (b)  shows  the  clear 
resolution  of  the  DOA  of  the  direct  path  signal. 

5.  Conclusions 

The  problem  discussed  in  this  paper  is  the 
estimation  of  sonobuoy  position  in  the  presence  of 
highly  coherent  environment  generated  by  the  propeller 
scatters  of  the  sonobuoy  signals.  A  two-step 
incoherent  subspace  averaging  technique  was  introduced 
which  mitigates  the  effect  of  multipath  on  the  noise 
subspace-based  eigenstructure  methods.  This  technique 
is  based  on  the  knowledge  of  the  propeller  spatial 
coordinates,  relative  to  the  array,  and  the  availability  of 
accurate  modeling  of  the  propeller  multipath 
reflections.  The  two  steps  correspond  to  two  types  of 
averaging.  The  first  is  to  project  a  weighted  sum  of  the 
steering  vector  and  the  source  subspace  representations 
of  the  multipath  spatial  sectors  on  the  noise  subspace. 
The  second  averaging  is  designated  for  broadband 


signals,  and  is  performed  over  the  above  projections  at 
different  frequency  bands.  It  was  shown  that  the 
proposed  technique  performs  properly  in  the  presence 
of  one  and  two  sonobuoy  signals.  The  use  of  coherent 
subspace  averaging  in  place  of  the  second  step  is 
expected  to  improve  resolution  and  was  not  explored  in 
this  paper. 
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ABSTRACT 

Incoherent  and  coherent  wideband  array  processing 
techniques  for  aeroacoustic  detection  and  tracking  of  ground 
vehicles  are  contrasted.  Experimental  results  for  a  circular 
array  are  presented,  illustrating  complexity  and  performance 
tradeoffs.  Incoherent  and  coherent  MUSIC  are  used  for 
comparison.  Complexity  is  dominated  in  both  cases  by 
singular  value  decompostion  (SVD)  calculation  performed  M 
times  for  the  incoherent  case  and  S  times  for  the  coherent  case, 
where  M  is  the  number  of  frequency  bins  and  5  is  the  number 
of  look  angles.  Good  results  are  obtained  with  the  incoherent 
method  for  small  M  provided  adequate  narrowband  SNR  is 
available.  The  coherent  approach  is  more  statistically  stable, 
and  S  can  be  reduced  by  employing  a  priori  coarse  direction 
estimates. 

1.  Introduction 

We  contrast  coherent  and  incoherent  wideband  array 
processing  techniques  for  aeroacoustic  detection  and 
tracking  of  ground  vehicles.  Experimental  results  for  a 
circular  array  of  6  sensors  plus  1  at  the  array  center  are 
presented,  illustrating  complexity  and  performance 
tradeoffs  in  coherent  versus  incoherent  processing. 


Time  (s) 

Figure  1 .  Spectrogram  of  a  ground  vehicle  and  helicopter. 

In  this  application,  array  baselines  are  physically 
constrained  by  system  requirements  and  variable  spatial 
coherence,  motivating  use  of  super-resolution  methods 
[1].  The  problem  is  made  difficult  by  a  number  of 
factors.  Source  acoustic  signatures  are  generally 
nonstationary  and  undergo  severe  fading.  The  usable 


channel  is  largely  restricted  to  [20,200]  Hz  due  to  wind 
noise  at  low  frequencies  and  poor  propagation  at  higher 
ones.  The  channel  response  is  generally  nonstationary 
due  to  a  variety  of  atmospheric  and  terrain  factors. 

There  may  also  be  significant  time- varying  multipath. 

A  typical  spectrogram  of  a  moving  vehicle  at  close 
range,  with  a  helicopter  flying  nearby,  is  shown  in  figure 
1.  The  helicopter’s  signature  consists  of  sharp  and  stable 
harmonics  emanating  from  the  main  rotor  blade.  These 
are  evident  in  the  time  interval  50-100  seconds.  The 
ground  vehicle  also  exhibits  a  harmonic  structure  but  it  is 
very  nonstationary  and  exhibits  strong  fades  during 
vehicle  maneuvering.  Note  the  lack  of  acoustic  energy 
beyond  200  Hz.  The  combined  effects  of  source  and 
channel  nonstationarities  produce  significant  signal 
variability,  even  at  relatively  close  ranges  of  hundred  of 
meters. 

2.  Incoherent  and  Coherent  Processing 

A  natural  extension  of  narrowband  high  resolution 
subspace  methods  is  to  combine  narrowband 
beampatterns  over  many  temporal  frequencies  [2].  This 
approach  is  useful  for  aeroacoustics  if  there  is  sufficient 
SNR  in  multiple  frequency  bins,  such  that  narrowband 
methods  such  as  MUSIC  yield  good  results 
independently  for  each  bin.  In  addition  to  the  relatively 
high  narrowband  SNR  requirement,  disadvantages  of  this 
incoherent  approach  include  degradation  in  the  presence 
of  correlated  multipath,  and  a  general  lack  of  statistical 
stability  when  compared  to  wideband  coherent  methods. 
The  incoherent  averaging  can  lead  to  false  peaks  in  the 
resulting  averaged  beampattern. 

To  overcome  the  nonstationary  nature  of  the  source, 
the  data  is  segmented  before  processing  into  fixed 
blocks,  and  stationarity  is  assumed  over  each  block.  We 
have  found  that  this  is  a  reasonable  assumption  for 
intervals  on  the  order  of  1  sec.  Over  each  processing 
interval  it  is  assumed  that  a  single  frequency  bin  is 
occupied  by  a  single  source  only.  This  takes  advantage 
of  the  nonstationarity  of  the  sources,  and  simplifies  the 
algorithmic  complexity  as  well  as  estimation  of  the 
number  of  sources.  This  assumption  is  justified  because 
different  wideband  sources  are  not  likely  to  occupy  all  of 
the  same  bins  in  any  given  processing  interval,  and 
change  bins  as  a  function  of  time.  In  practice,  the 
direction  of  arrival  (DOA)  estimates  are  fed  into  a 
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tracker  that  is  reasonably  robust  and  therefore  able  to  fill 
in  missing  or  remove  outlying  data. 

Wideband  coherent  processing  gain  is  possible  using 
the  steered  covariance  method  (STCM)  originated  by 
Wang  and  Kaveh  [3,4].  STCM  is  based  on  forming  the 
composite  covariance  matrix  given  by 

where  M  is  the  number  of  narrowband  frequency  bins, 
and  )  is  the  estimated  spatial  correlation  matrix 


at  frequency  co^.  The  steering  or  focusing  matrix 
7(0)^  ,0)  is  a  function  of  both  frequency  and  look  angle 


0  .  Here  it  is  defined  as 


no),,0)  = 


0 

0 

0 


0 

^  27C/^  A72 

0 
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0 

0 

0 

g27r/„Av 


(2) 


where  Ar,-  =  — sinij)^. ,  =0-a^-  where  a,-  is  the 

c 

relative  angle  to  the  normal  for  sensor  i  for  /=!,  2, N, 
d  is  the  radius  of  the  circular  array,  and  c  is  the  speed  of 
sound  in  air.  Other  forms  for  7(0) ^,0)  have  been 
suggested  to  reduce  focusing  errors.  The  resulting 
focused  covariance  matrix  ^(0)  is  such  that  signals  in 
the  respective  narrowband  correlation  matrices  are 
mapped  into  the  same  subspace,  yielding  coherent 
processing  gain  over  multiple  frequencies.  Conventional 
subspace  methods  such  as  MUSIC  can  then  be  applied  to 

/?(0) ,  thus  requiring  eigenanalysis  for  each  0. 

The  complexity  of  the  coherent  approach  is  increased 
due  to  the  need  for  computing  ^(0)  for  every  0. 
However,  the  computational  load  can  be  lowered  by 
using  preliminary  estimates  of  the  source  locations, 
obtained,  e.g.,  by  conventional  beamforming  [3].  Also,  it 
is  assumed  that  there  is  at  most  a  single  source  for  a 
single  look  angle  0.  As  we  will  see,  the  relative 
computational  complexity  between  the  coherent  and 
incoherent  techniques  depends  on  the  relative  size  of  the 
number  of  look  directions  versus  the  number  of 
narrowband  frequency  bins  over  which  wideband 
processing  occurs. 


3.  Implementation 

In  this  section  the  processing  schemes  are  described, 
and  estimates  of  complexity  presented  for  comparing  the 
coherent  and  incoherent  approaches.  In  both  cases 
MUSIC  is  used  as  the  means  of  computing  the 
beampattern.  The  basic  steps  are  (i)  use  block-adaptive 
pre-processing  to  adaptively  select  the  narrowband 
frequency  bins,  (ii)  apply  incoherent  or  coherent 


techniques,  and  apply  MUSIC,  and  (iii)  estimate  the 
directions  of  the  sources  from  the  resulting  beampattems. 

Let  yi  (n)  denote  the  output  of  the  ith  sensor  from  an 
array  ofA/^sensors,  and  let  Y^ik)  denote  DFTly^in)}, 

The  average  sum  of  the  \Yi{k)^  is  obtained  in  order  to 

adaptively  select  frequency  bins  of  interest.  This  can  be 
performed  in  a  variety  of  ways,  from  simple  thresholding 
based  on  bin  SNR,  to  more  complex  schemes  such  as 
harmonic  association.  Here,  we  simply  select  the  M 
highest  power  bins  within  the  range  {O/^vv  to 

The  conventional  narrowband  MUSIC  beampattern  is 
computed  M  times.  For  each  look  angle  0,  we  compute 

n  (CO^)£'((O^,0)j  (3) 

where  ^ ,  0 )  =  diag{T^  (co  ^ ,  0 ) }  is  the  steering 

vector  and  the  noise  orthonormal  projector  is  defined  as 
=  (4) 

Taking  Rj.  (co  ^ )  to  be  x  then,  by  assumption,  the 
noise  subspace  consists  of  A^-1  eigenvectors 
corresponding  to  the  A^-1  smallest  eigenvalues  of 
m )  j  these  form  U((0  ^ ) . 

The  computational  complexity  is  approximately 
M[0{N^  )  +  0{N^)  +  S ■  0{N^  )] ,  where  M  is  the  number 
of  frequency  bins  and  S  is  the  number  of  look  angles. 

The  first  squared  term  in  the  bracket  corresponds  to  the 
formation  of  the  correlation  matrix  Ry  (co  ^ ) ,  the  cubic 

term  is  for  an  S  VD  calculation  to  form  (o)  ^ ) ,  and  the 

last  term  corresponds  to  (3)  which  is  computed  for  each 
look  angle. 

The  STCM  approach  requires  focusing  as  a  function 
of  look  direction.  Experimental  results  shown  in  the  next 
section  are  based  on  computing  over  360  degrees  in  1 
degree  steps.  After  computation  of  R(Q)  for  some 
angle  0,  the  SVD  of  R(0)  yields  the  unitary  noise 
subspace  estimate  (0) .  We  assume  only  one  target 

for  each  look  angle,  so  that  the  signal  subspace  consists 
of  one  eigenvector,  with  the  other  A^-1  eigenvectors 
forming  the  noise  subspace.  The  coherent  wideband 
MUSIC  spatial-spectrum  is  then  calculated  via 

4,,(e)  =  [L"(7„(e)jy„(0)"L]'’  (5) 

where  L  is  an  A^-element  vector  of  ones. 

The  computational  complexity  is  approximately 

■0iN^)  +  0(N^)  +  OiN^ )] .  The  first  term  in  the 
bracket  corresponds  to  the  formation  of  (co  „ )  and 
the  focusing  operation  of  (1).  Note  the  diagonal  form  of 
reduces  the  computation  in  (1).  These 
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operations  must  be  performed  over  the  range  of 
frequencies  (M),  and  the  range  of  look  angles  (S).  The 
cubic  term  is  for  the  SVD  of  ^(0) ,  and  the  last  term  is 
for  calculation  of  (6)  (5).  These  are  repeated  for 

each  look  angle,  i.e.,  S  times. 

For  both  methods,  the  most  expensive  computational 

cost  is  the  SVD  which  is  0{N^ ) .  This  term  tends  to 
dominate  the  complexity  comparison.  Note  that  for  the 
incoherent  method  it  is  M  •  0(N^ )  while  for  the 
coherent  it  is  5  ■  0(N^ ) ,  so  that  the  relative  complexity 
is  controlled  by  the  relative  size  of  M  and  5.  By 
assuming  a  single  target  for  each  distinct  frequency  bin 
(for  incoherent  processing)  and  a  single  target  for  each 
look  angle  (for  coherent  processing),  we  can  potentially 
apply  faster  eigenanalysis  algorithms  than  the  SVD.  To 
reduce  the  number  of  frequency  bins,  harmonic  line 
association  techniques  can  be  used  to  group  a  set  of 
frequency  bins  for  each  source  and  then  only  applying 
MUSIC  to  the  largest  narrowband  frequency  for  each  set. 
To  reduce  the  number  of  look  angles  S,  coarse  angle 
estimates  can  be  used  to  narrow  the  field  of  view. 

4.  Experimental  results 

In  this  section  experimental  results  for  DOA 
estimation  of  ground  vehicles  traveling  on  a  2  km  area 
of  open  grass  field  are  presented.  For  each  test  run,  one 
of  the  vehicles  was  equipped  with  a  GPS  sensor  to 
provide  accurate  positioning  ground  truth.  Figure  2 
shows  raw  experimental  DOA  estimates,  for  a  single 
source,  for  incoherent  and  coherent  wideband  MUSIC 
versus  the  GPS  angles  on  a  test  run  of  250  seconds  in 
length.  Mean  square  error  (MSB)  and  mean  absolute 
error  (MAE)  results  are  shown  in  table  1  for  various  sets 
of  M  frequency  bins.  The  M  frequency  components  are 
selected  based  on  the  highest  bin  SNR’s  in  the  frequency 
range  of  [20,200]  Hz. 

The  MSB’s  and  MAE’s  are  calculated  with  the 
outliers  removed  using  the  criteria 

|e  -  meciian(e)\  >  3a  (8) 

where  e  is  the  angle  error  or  angle  difference  between 
the  DOA  estimate  and  the  true  angle  measured  by  GPS, 
ando  is  the  mean  absolute  deviation  [5].  An 
example  of  this  is  shown  in  figure  3,  with  ±3o 
shown  as  straight  lines.  The  outliers  can  be  caused  by 
several  factors  including  fading,  wind  noise  and  acoustic 
source  variations.  For  the  error  analysis  in  table  1,  the 
number  of  outliers  ranges  from  15  to  24  out  of  a  total  of 
125  processing  intervals  of  length  1  sec  each,  sampling 
rate  of  1  kHz,  and  1024-pt  FFT’s.  For  M  =  1 , 
incoherent  and  coherent  wideband  MUSIC  reduce  to  the 
narrowband  case.  Processing  gain  is  evident  for  both 


methods,  in  that  the  estimates  generally  improve  for 
increasing  M.  For  this  single  source  experiment,  the 
incoherent  approach  produced  smaller  errors  both  in 
terms  of  MSB  and  MAE,  reflecting  the  generally  high 
SNR  in  this  experiment. 
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Figure  2:  Raw  DOA  estimates  for  (a)  incoherent  and  (b)  coherent 
wideband  MUSIC  for  M=50  and  GPS  ground  truth. 


Incoherent  MUSIC 

Coherent  MUSIC  1 

M 

MSE 

MAE 

MSE 

MAE 

1 

3.558 

1.419 

3.558 

1.419 

10 

2.144 

1.083 

3.948 

1.422 

20 

1.235 

0.870 

3.684 

1.366 

50 

1.221 

0.863 

2.345 

1.130 

100 

1.172 

0.838 

2.684 

1.178 

Table  1.  MSB  and  MAE  for  wideband  processing  over  M  frequency 
bins  between  [20,200]  Hz. 


While  the  single  source,  high  SNR  case  can  be 
handled  with  incoherent  MUSIC  and  small  M,  the 
situation  changes  with  multiple  sources  and  low  SNR’s. 

A  two-source  example  is  illustrated  in  figure  4,  with 
sources  at  50  and  180  degrees.  Here  beampatterns  are 
shown  for  a  single  processing  interval,  with  M  varying 
over  10,  20,  50  and  100.  The  incoherent  method 
accurately  locates  the  directions  of  the  sources  for  all 
four  cases.  It  produces  more  distinct  and  sharp 
beampatterns  than  the  coherent  method.  However,  for 
M=50  (figure  4c)  and  especially  M=100  (figure  4d),  the 
incoherent  method  produced  additional  spurious  peaks  in 
the  beampattern  that  can  be  misconstrued  as  sources. 

The  explanation  for  this  behavior  is  that  there  is  high 
SNR  in  only  a  few  of  the  spectral  components  and  no 
significant  SNR  elsewhere  in  the  data  signature.  By 
incoherently  averaging  additional  beampatterns  from  low 
SNR  spectral  components,  the  overall  beampattern 
degrades.  The  beampattern  for  the  coherent  method  on 
the  other  hand,  becomes  more  pronounced  as  M 
increases,  and  exhibits  very  good  statistical  stability. 
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MSE=1.221,  MAE=0.863,  3SigMAD=3.394,  17  outliers 


Figure  3:  DOA  error  estimates  for  incoherent  wideband  MUSIC 
(Af=50)  illustrating  outlier  removal  for  MSE  and  MAE  calculation. 
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Figure  4:  DOA  spectra  estimates  for  2  targets  located  at  0=50  and 
0=180  for  incoherent  (thin  line)  and  coherent  wideband  MUSIC  for  (a) 
M=10  (b)  M=20  (c)  M=50  and  (d)  Af=100. 


5.  Conclusions 

Both  the  incoherent  anti  coherent  wideband  MUSIC 
methods  provide  processing  gain  over  narrowband 
MUSIC,  as  exhibited  by  experiment.  Here  the  sources 
used  are  generally  characterized  as  a  sum  of  narrowband 
frequency  components  for  a  majority  of  the  time.  Thus, 
given  adequate  SNR,  incoherent  MUSIC  performed  well 
and  yielded  sharp  and  distinct  peaks  in  the  beampattern. 
However,  frequency  selection  is  an  issue  as  the  inclusion 
of  low  SNR  bins  tends  to  degrade  the  resulting 
beampattern,  reducing  source  peaks  and  introducing 
spurious  ones.  In  contrast,  the  coherent  MUSIC 
approach  is  much  more  statistically  stable,  with  a 
beampattern  that  generally  improves  (rather  than 
degrades)  with  the  addition  of  lower  SNR  bins. 

However,  the  coherent  approach  requires  more  frequency 
bins  be  included  (i.e.,  larger  M)  to  achieve  the  same 
accuracy,  although  this  is  a  function  of  SNR  as  well.  The 


bias  introduced  in  the  coherent  processing  has  been 
ignored,  and  results  in  table  1  may  partly  reflect  this  fact. 
It  appears  that  the  coherent  approach  will  degrade  more 
gracefully  as  the  SNR  is  decreased.  Thus,  further 
experiments  are  warranted  for  lower  SNR  (longer  range) 
cases,  as  well  as  including  sources  that  do  not  exhibit 
strong  narrowband  signatures. 

The  computational  complexity  comparison  between 
the  two  methods  is  largely  governed  by  the  SVD 
calculation  which  is  0(N^ ) ,  with  a  multiplier  given  by 
the  number  of  spectral  components  M  (incoherent)  or 
number  of  look  angles  S  (coherent).  As  we  have  seen,  M 
can  be  made  small  when  the  source  signatures  consist  of 
a  sum  of  high  SNR  narrowband  components,  enabling 
use  of  the  incoherent  approach.  The  number  of  look 
angles  can  be  reduced  by  incorporating  coarse  DOA 
estimates  obtained  in  a  preprocessing  step  such  as  a 
conventional  beamformer.  Thus,  the  complexity  of  the 
coherent  approach  can  be  made  manageable  (with 
respect  to  the  incoherent  complexity),  and  is  likely  to  be 
warranted  for  more  difficult  sources  at  longer  ranges. 

It  is  of  interest  to  consider  methods  for  calculating 
the  signal  subspace  only,  as  opposed  to  the  full  SVD 
calculation,  because  in  both  methods  the  signal  subspace 
is  assumed  to  consist  of  one  component  only.  Further 
work  of  interest  includes  reducing  computation  by 
exploiting  the  radial  symmetry  of  the  circular  array,  e.g., 
see  Doron  [6],  and  effects  of  calibration  and  sensor 
placement  errors,  e.g,,  see  Swindlehurst  [7].  We  note 
that  only  rudimentary  effort  was  made  to  calibrate  the 
array  used  in  the  experiments. 
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Abstract 

This  paper  is  focused  on  the  design  of  partial  re- 
sponse  equalized  channels  and  maximum-likelihood  se¬ 
quence  detection  for  high  density  magnetic  recording 
systems.  Methods  for  designing  linearly  equalized  par¬ 
tial  response  channels  are  presented  and  applied  to  a 
Lorentzian  model  for  the  magnetic  recording  channel. 
Reduced-state  maximum-likelihood  sequence  detection 
methods  are  employed  for  the  partial  response  equalized 
channels  and  the  error  rate  performances  of  the  detec¬ 
tors  are  evaluated. 

1.  Introduction 

A  major  factor  that  limits  the  density  of  magnetic 
recording  (MR)  systems  is  intersymbol  interference  (ISI) 
To  reduce  the  effects  of  ISI  on  high  density  MR  chan¬ 
nels,  various  types  of  equalizers  have  been  employed 
[1].  Among  these  are  linear  equalizers  (LE),  decision- 
feedback  equalizers  (DFE),  and  maximum-likelihood 
sequence  detection  (MLSD).  The  latter  is  efficiently  im¬ 
plemented  by  means  of  the  Viterbi  Algorithm  (VA). 

MLSD  is  known  to  be  the  optimum  detection  crite¬ 
rion  in  a  channel  with  ISI,  in  the  sense  that  the  error 
rate  for  a  sequence  of  symbols  is  smallest  among  the 
class  of  equalization  methods.  However,  the  computa¬ 
tional  complexity  of  the  MLSD  criterion  increases  ex¬ 
ponentially  with  the  length  of  the  channel  memory  [2]. 
Hence,  when  the  span  of  the  ISI  is  large,  the  compu¬ 
tational  complexity  of  MLSD  becomes  prohibitive.  On 
the  other  hand,  a  LE  is  significantly  simpler  to  imple¬ 
ment.  Its  major  limitation  is  that  it  enhances  the  addi¬ 
tive  noise  in  a  channel  with  ISI.  The  loss  in  performance 
of  a  LE  due  to  noise  enhancement  is  unacceptably  high 
in  a  high  density  MR  system. 

A  commonly  used  method  for  reducing  the  compu¬ 
tational  complexity  of  MLSD  is  to  combine  a  LE  with 
MLSD.  In  particular,  the  LE  is  used  to  equalize  the  MR 


channel  to  a  partial  response  of  the  type  (1-D)(H“D)”, 
where  D  represents  a  delay  of  one  symbol  and  n  is  a 
non-negative  integer  that  is  selected  to  take  the  values 
n  —  0,  1,  2,  •  •  •.  In  general,  the  optimum  choice  of  n 
increases  as  the  density  of  the  MR  system  is  increased. 
By  employing  a  LE  to  equalize  the  MR  channel  to  a 
partial  response  of  short  duration,  the  noise  enhance¬ 
ment  of  the  LE  is  significantly  reduced  compared  to  a 
full  response  LE.  The  MLSD  that  follows  the  LE  is  used 
to  detect  the  data  symbols  in  the  partial  response  sig¬ 
nal.  Thus,  the  combination  of  a  LE-MLSD  (or  LE-VA) 
is  a  practical  method  for  achieving  high  density  mag¬ 
netic  recording  with  a  lower  computational  complexity 
than  MLSD. 

In  this  paper,  we  investigate  another  partial  re¬ 
sponse  (PR)  equalization  method  for  reducing  the  com¬ 
putational  complexity  of  the  MLSD  method.  For  the 
Lorentzian  channel  model,  we  determine  the  optimum 
PR  targets  and  the  corresponding  noise  enhancement 
values  for  different  lengths  of  the  equalized  channel. 
We  have  found  that  the  optimum  method  for  design¬ 
ing  the  PR  equalized  target  results  in  a  lower  noise 
enhancement  compare  to  the  PR  target  (1-D)(1-1-D)”. 
However,  the  noise  reduction  is  achieved  at  the  expense 
of  an  increase  in  the  length  of  the  equalized  PR  tar¬ 
get.  For  PR  targets  of  large  length,  the  computational 
complexity  of  MLSD  is  still  prohibitive  for  high  rate 
MR  systems.  We  have  addressed  the  problem  by  in¬ 
vestigating  reduced  state  MLSD  methods.  We  have 
found  that  delayed  decision-feedback  sequence  estima¬ 
tion  (DDFSE)  as  described  in  [3]  is  particularly  effec¬ 
tive  for  the  PR  equalized  channel. 

2.  Optimization  of  the  LE 

With  the  conventional  LE-VA  technique,  a  linear 
prefilter(P)  or  LE  is  used  to  adjust  the  channel  (H)  to 
a  desired  impulse  response  (Q),  which  is  seen  by  the 
VA. 
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Figure  1:  Channel  truncation  using  linear  prefilter 

We  consider  methods  for  selecting  P.  In  all  cases, 
the  LE  is  an  FIR  filter  of  length  2M-+-L  Furthermore, 
the  channel  is  also  modeled  as  an  FIR  filter. 

A.  Optimization  of  the  LE  for  a  Specified 
DIR 

The  mean-squared  error  (MSE)  at  the  output  of  the 
LE  may  be  expressed  as: 


MSE  =  P'AP  +  Q'Q  -  2P'HQ  -  2A(  J^Q  -  1)  (3) 

where  J  is  an  L-element  column  vector  whose  first  ele¬ 
ment  is  1  and  all  the  other  elements  are  zero.  Taking 
the  derivatives  of  the  right  hand  side  of  (3)  with  respect 
to  P,  Q,  and  A,  respectively,  and  setting  the  resulting 
expression  to  zero,  we  obtain: 

=  (5) 

Popi  =  A~^HQopi  (6) 

where  I  is  an  identity  matrix  and  A  is  equal  to  the  min¬ 
imum  mean-squared-error. 

3,  Linear  Equalizer  for  MR  Channels 


MSE  P^AP  -h  Q'Q  -  2P'FQ  (1) 

where  P  is  the  LE  impulse  response  vector,  Q  is  DIR 
vector,  H  is  the  channel  impulse  response  vector  ,  A 
is  the  channel  covariance  matrix  with  elements  a*j  = 
rirjy  and 

P  —  \p~M  '  *  'Pm] 

Q  =  bo  *  *  -gL-i] 

H  —  •  •  •  /iiv] 

For  a  specified  DIR  Q,  the  impulse  response  P  of 
the  LE  that  minimizes  the  MSE  is 

P^A-^HQ  (2) 

B.  Optimization  of  the  LE  for  a  DIR  with 
Energy  Constraint 

Falconer  and  Magee  [4]  considered  the  problem  of 
finding  the  optimum  LE  response  that  minimizes  the 
MSE  in  (1)  where  the  DIR  of  a  specified  length  is  con¬ 
strained  such  that  Q'Q  =  1.  The  solution  to  the  op¬ 
timization  problem  is  also  given  by  (2)  where  the  DIR 
Q  is  the  eigenvector  of  the  matrix  (I-H'A“^H)  corre¬ 
sponding  to  the  minimum  eigenvalue. 


The  magnetic  recording  (MR)  channel  is  modeled 
as  a  linear  filter  whose  step  response  is  the  Lorentzian 
pulse 


s{t)  = 


1 

1  +  ( 


(7) 


where  pw50  is  the  “half-amplitude  width”  of  the  pulse, 
which  is  equivalent  to  the  amount  of  time  that  s(t)  is 
greater  than  or  equal  to  half  of  its  peak  value.  The 
input  to  this  channel  is  a  binary  data  sequence  {ajb  = 
ibl},  the  channel  output  is  assumed  to  be  corrupted  by 
additive  white  Gaussian  noise(AWGN).  The  bit  rate  is 
where  Tb  is  the  bit  interval.  The  ratio  S  = 
is  called  the  normalized  information  density.  For  the 
continuous-time  system  model,  Bergmans  [5]  has  de¬ 
rived  an  equivalent  discrete-time  system  model  which 
is  illustrated  in  Fig.  2,  where  the  parameters  are 


fc+f 


(8) 


hk  —  9k  —  9k-i 


(9) 


C.  Optimization  of  the  LE  for  a  DIR  with 
Element  Constraint 

Suppose  we  specify  the  first  element  qo  of  the  DIR 
to  be  unity  and  leave  the  remaining  elements  of  Q  un¬ 
specified.  That  is,  Q  =  [1  qi  q2  •••  q^-i]-  The  LE 
impulse  response  P  and  the  remaining  values  of  Q  are 
selected  to  minimize  the  MSE 


Figure  2:  Discrete- time  MR  system  model 

4.  Performance  of  the  LE-VA  system 
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When  the  LE  is  followed  by  the  MLSD  that  is  ef¬ 
ficiently  implemented  by  the  VA,  the  error  probability 
of  the  system  may  be  approximated  by 

('») 

where  K  is  a  constant  that  depends  on  the  characteris¬ 
tics  error  events  and  MSE  is  the  mean-squared-error  at 
the  output  of  the  LE.  is  the  minimum  Euclidean 
distance  for  error  events. 

We  note  that  the  probability  of  error  depends  on 
the  ratio 


Ra  = 


dl. 


8MSE 


(11) 


Here,  the  ratio  may  be  used  as  a  performance  index  for 
comparing  MR  systems  with  different  DIR.  It  should 
be  emphasized  that  in  using  the  ratio  Ra  as  a  perfor¬ 
mance  index,  we  have  ignored  the  fact  that  the  noise  at 
the  input  to  the  VA  is  generally  correlated.  The  VA  is 
assumed  to  ignore  this  correlation  in  the  computation 
of  the  metrics. 


5.  Reduce-state  MLSD 


We  have  observed  that  by  constraining  the  DIR  to 
have  the  characteristics  Q  =  [1  qi  •  •  • 
of  the  energy  in  the  DIR  is  contained  in  the  first  few 
coefficients,  say  qi.qzr  •  where  L2<  L.  This  obser¬ 
vation  suggests  that  we  reduce  the  computational  com¬ 
plexity  of  the  VA  by  truncating  the  channel  response. 
Duel-Hallen  and  Heegard  [3]  have  described  an  algo¬ 
rithm  ,  called  delayed  decision-feedback  sequence  esti¬ 
mation  (DDFSE),  which  performs  channel  truncation. 
The  complexity  of  the  DDFSE  algorithm  is  controlled 
by  a  parameter  n  that  can  be  varied  from  zero  to  the 
memory  of  the  channel(in  our  case,  the  DIR).  The  al¬ 
gorithm  is  based  on  a  trellis  search  with  the  number  of 
states  equal  to  2".  When  /x  =  1,  the  DDFSE  reduces 
to  the  DFE.  When  /i  equals  the  length  of  the  chan- 
nel(DIR),  the  DDFSE  is  identical  to  the  full  complex¬ 
ity  VA.  Hence,  for  1<  n  <L,  the  DDFSE  is  a  reduced- 
state  VA  with  feedback  incorporated  into  the  structure 
of  path  metric  computations. 

In  the  Duel-Hallen  and  Heegard  paper  [3],  it  is  sug¬ 
gested  that  the  channel  be  minimum  phase.  However, 
this  condition  is  not  necessary.  As  long  as  most  of  the 
channel  energy  is  contained  in  the  first  few  coefficients 
of  Q,  the  parameter  /r  can  be  selected  accordingly  to 
include  the  larger  terms. 

For  the  MR  channel  with  S  =  3  and  S  =  4,  the 
designed  DIR’s  with  element  constraint  are  shown  in 
Tables  1  and  2.  For  these  two  cases,  we  have  selected 


/i  =  L2  =  3  and  //  =  L2  =  4,  respectively.  The  values 
of  for  the  truncated  channel  and  the  ratio  Ra  are 
given  in  Tables  3  and  4.  For  comparison,  we  also  give 
the  corresponding  values  of  d^in  LEl-VA 

when  Q  is  chosen  as  (1-D)(1-|-D)”  in  Table  5  and  Table 
e.From  Tables  1  and  2,  we  can  see  the  main  energy  of 
the  DIR  is  contained  in  the  first  few  coefficients  of  Q. 
We  note  that  for  L>  9  and  S=3  and  for  L>12  and  S  = 
4,  there  are  relatively  small  performance  gains. 

6.  Simulation  Results 

We  used  the  two  estimation  methods,  LE-VA  with 
Q  chosen  as  (1-D)(1+D)",  and  LE-DDFSE,  to  perform 
simulations  with  the  MR  channel  S=3  and  S=4.  The 
performances  are  shown  in  Fig.  3  and  Fig.  4.  The  in¬ 
put  SNR  is  defined  as  (SNR)i„  =  ^.  We  can  see  the 
LE-DDFSE  has  a  2dB  and  3dB  performance  improve¬ 
ment  over  the  LE-VA  for  S=3  and  S=4  respectively. 

7.  Conclusion 

In  this  paper  we  have  presented  a  method  for  re¬ 
ducing  the  computational  complexity  of  the  MLSD  for 
high  density  MR  systems.  We  showed  that  by  proper 
design  of  the  DIR,  a  LE  followed  by  a  DDFSE  algo¬ 
rithm  yields  superior  performance  to  existing  methods. 
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Figure  3  The  simulation  results  for 
S=3  with  LE-DDFSE  and  LE^VA 
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Abstract 

The  formation  of  microcracks  in  a  material  creates 
propagating  ultrasonic  waves  that  are  called  Acoustic 
Emissions  (AEs).  These  AEs  provide  an  early  warning 
to  the  onset  of  material  failure.  In  practical  cases,  how¬ 
ever,  these  AEs  have  to  be  detected  at  very  low  SNRs, 
amongst  strong  interference  and  random  noise.  This 
paper  presents  some  preliminary  results  from  an  on¬ 
going  investigation  into  the  modeling  and  detection  of 
AEs  as  a  viable  technique  for  predictive  diagnostics. 


1  Introduction 

Automatic  monitoring  techniques  are  being  consid¬ 
ered  as  a  means  to  safely  simplify  or  dispense  of  peri¬ 
odic  fault  inspection  procedures.  One  such  automatic 
monitoring  technique  is  based  on  the  detection  of  AEs, 
that  are  generated  due  to  the  formation  of  microcracks 
in  a  material. 

AE  signals  have  been  extensively  studied  (e.g. 
[7][2][1]).  However,  these  studies  were  using  data  ac¬ 
quired  from  isolated  material  specimens  in  controlled 
laboratory  conditions.  Hence,  they  do  not  directly  re¬ 
late  to  a  practical  case  wherein  the  AE  signal  has  to  be 
detected  in  the  presence  of  strong  interference,  caused 
due  to  mechanical  motion  in  the  machine.  This  paper 
addresses  the  problem  of  detecting  the  AE  signal  in 
such  a  “real  world”  scenario.  The  various  stages  of  the 
proposed  procedure  is  shown  in  Figure  1.  The  paper 
presents  some  preliminary  results  obtained  on  real  AE^ 
and  interference  data. 

*This  work  was  supported  in  part  by  ONR  under  URL 

^Thanks  to  Professor  Gerberich  and  David  Bahr  of  the  Ma¬ 
terial  Science  Department  for  their  kind  assistance  in  providing 
us  with  data. 


Figure  1 .  Block  diagram  of  proposed  proce¬ 
dure  for  AE  signal  processing 


2  Acoustic  Emissions 

AEs  are  transient  in  nature,  and  can  be  modeled  as 
a  sum  of  decaying  complex  exponentials  [4]  as, 

K 

^  cos(27r/fe[t  -  T]  *f  (i)k)u{t  -  T)  (1) 

k=i 

where  u{t)  is  the  step  function,  and  Ak,(f>k,Oik^  and  fk 
are  amplitude,  phase,  decay  rate,  and  frequency  of  the 
AE  signal  component.  AE  signals  may  be  broad 
band,  with  energy  ranging  as  high  as  several  MHz  [7]. 

Figure  2  shows  an  AE  obtained  from  a  600nm  thick 
Tantalum  Nitride  specimen  deposited  on  sapphire,  and 
the  corresponding  model  obtained  via  Prony’s  method, 
([3]).  The  estimated  parameters  are  tabulated  in  Ta¬ 
ble  1. 

This  AE  was  generated  by  a  microcrack  that  was 
initiated  using  a  nanoindenter  (a  device  which  pushes 
a  diamond  tip  into  a  material  in  a  controlled  manner). 
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Figure  2.  Tantalum  Nitride  acoustic  emission 
measured,  (dotted),  and  model,  (solid). 


Table  1.  Exponentially  decaying  sinusoid  pa¬ 
rameters  corresponding  to  Tantalum  Nitride 
AE. 


A(V) 

(^(radians) 

a (/sec) 

/(Hz) 

0.102 

0.075 

-2.225e+04 

1.057e+05 

0.099 

2.720 

-8.116e+04 

6.218e+04 

0.080 

1.938 

-7.214e+04 

3.923e+04 

The  setup  to  generate  such  a  “microevent”  is  shown  in 
Figure  3. 

3  Prefiltering 

We  are  looking  at  scenarios  where  the  AE  signal  is 
buried  in  strong  interference  (which  could  be  periodic) 
due  to  mechanical  motion,  like  the  movement  of  a  pis¬ 
ton  in  an  engine  or  the  rotation  of  helicopter  blades. 
In  consequence,  it  becomes  necessary  to  first  mitigate 
this  interference  using  a  suitable  prefiltering  technique. 

Figure  4  shows  typical  interference  data  recorded 
from  a  lawnmower  at  2MHz.  The  duration  of  the  ob¬ 
servation  was  one-twentieth  of  a  second.  The  power 
spectrum  of  this  lawnmower  noise  is  dominated  by  the 
power  at  frequencies  below  30  kHz.  Hence,  it  would 


Nanoindenter 

probe 


Figure  3.  Setup  to  generate  a  microevent 


time  In  secs 


Figure  4.  Measured  lawnmower  noise 


Figure  5.  Typical  input  of  prefiiter 


be  reasonable  to  conclude  that  a  simple  high  pass  filter 
with  a  cutoff  of  about  30KHz  should  suffice  to  filter  out 
this  periodic  noise  from  a  Tantalum  Nitride  AE,  which 
has  a  single  dominant  component  at  about  105  KHz. 

Another  possible  method  for  performing  the  pre¬ 
filtering  is  to  use  a  linear  prediction  (LP)  filter  designed 
using  the  method  in  [8],  using  the  first  8000  samples  of 
the  data.  The  coefficients  can  be  updated  or  redesigned 
after  a  certain  period  of  time  to  reflect  any  changes  in 
the  characteristics  of  the  machine  noise.  We  found  that 
LP  filtering  worked  better  than  an  HP  filter,  for  some 
types  of  AEs,  (cf.  Figures  5  and  6).  Of  course,  a 
combination  of  HP  filtering  and  LP  filtering  can  also 
be  considered. 

4  Detection 

At  the  output  of  the  prefilter  the  AE  signal,  if 
present,  is  usually  buried  in  additive  noise  at  very  low 
SNRs.  In  this  section,  we  present  results  from  two 
methods  that  we  have  considered  for  possible  robust 
detection  of  the  signal. 
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Figure  6.  Output  of  linear  prediction  filter 


Figure  8.  F-statistic  for  Tantalum  Nitride  sig¬ 
nal  embedded  in  lawnmower  noise. 
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Figure  7.  Tantalum  Nitride  signal  embedded 
in  lawnmower  noise. 


4.1  Optimal  Tapers 

We  applied  the  techniques  described  in  [5]  to  de¬ 
termine  whether  each  exponentially  decaying  sinusoid 
detected  via  Prony’s  method  was  actually  present,  or  a 
false  reading  produced  by  noise.  The  method  consists 
of  applying  an  optimal  window  function  to  the  data, 
(optimal  in  the  sense  that  it  minimizes  spectral  leak¬ 
age  for  constant  SNR),  then  testing  the  fit  of  the  model 
to  the  data  with  an  F-statistic. 

The  optimal  windows  are  determined  by  parameters 
P,  z/,  and  We  chose  time-bandwidth  product  P  =  4, 
(iVW  =  Stt).  Experiments  showed  that  noise  param¬ 
eter  V  had  a  small  effect  on  the  final  result,  so  we  let 
z/  =  0.  The  exponential  decay  parameter  P  was  chosen 
to  match  the  value  returned  from  Prony’s  method. 

The  method  works  well  for  high  SNR.  The  Tanta¬ 
lum  Nitride  signal  was  embedded  in  lawnmower  noise, 
(cf.  Figure  4),  as  shown  in  Figure  7.  Figure  8  shows 
the  resulting  F-statistic.  There  is  a  clear  peak  near 
106  kHz,  the  frequency  identified  by  Prony’s  method. 


4.2  Dominant  Component 

At  the  output  of  the  prefilter,  the  AE  signal  (if 
present),  is  usually  buried  in  additive  noise  at  very  low 
SNRs.  Under  the  assumption  that  both  the  noise  and 
the  AE  can  be  modeled  as  Gaussian  random  vectors, 
we  can  state  the  detection  hypothesis  problem  as, 

Ho:Y^N  ,  Hi:Y  =  N  +  9S  (2) 

where  N  M  (//„,  Pn)  is  the  noise  vector  and  5  ~  AT 
{fis.Rs)  is  the  signal  (AE)  vector.  9  is  the  unknown 
amplitude  of  the  signal  vector.  No  uniformly  powerful 
test  (UMP)  exists  for  the  above  hypothesis  with  respect 
to  9,  However,  a  locally  most  powerful  (LMP)  test  can 
be  found  assuming  low  SNRs. 

The  LMP  test  statistic  Tio  for  the  above  hypothesis 
after  pre-whitening  of  the  additive  Gaussian  noise  is 
given  by,  ([6]), 

Tioi^ccf'Rsy  (3) 

where  y  is  the  observation  vector.  This  is  equivalent 
to, 

r 

(x'^Xk  <y,Vk  (4) 

k-l 

where  Aj^  is  the  eigenvalue  and  Vk  is  the  eigen¬ 
vector  of  P^.  r  is  the  rank  of  P^.  The  decision  statistic 
Tio  given  in  (4)  can  be  implemented  with  a  bank  of  r 
causal,  linear  filters  in  parallel  as  shown  in  Figure  9. 
The  impulse  response  of  the  k^^  channel  filter  is  given 
by, 

hk{n)  =  Vk{N  -  n)  (5) 

where  n  =  1,2.. .AT.  N  is  the  number  of  samples  in  a 
data  block.  The  output  of  each  channel  is  squared  and 
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Figure  9.  Implementation  of  test  statistic 


X  10* 


Figure  10.  Output  of  prefilter 


the  test  statistic  is  obtained  by  a  linear  combination 
(with  the  corresponding  eigenvalues  as  the  weights)  of 
these  squared  outputs. 

Four  AEs  from  the  Tantalum  Nitride  specimen  were 
used  to  estimate  the  signal  covariance  matrix.  The 
rank  of  the  estimated  covariance  matrix  was  found  to 
be  2.  An  AE  signal  from  a  Tantalum  Nitride  specimen 
was  then  added  to  the  lawnmower  noise  to  simulate  a 
practical  case,  wherein  the  AE  signal  is  measured  at 
very  low  SNRs  amidst  strong  correlated  noise.  It  is 
important  to  note  that  the  AE  signal  which  was  added 
to  the  noise  was  was  not  one  of  the  AE  signals  used 
to  estimate  the  covariance  matrix.  Since  the  rank  is  2, 
only  two  channels  of  filtering  is  required.  The  impulse 
response  for  these  two  filters  are  given  by  (5).  A  typical 
output  of  the  prefilter  is  shown  in  Figure  10.  The  AE 
(starts  at  n= 55000)  signal  is  buried  in  noise.  The  corre¬ 
sponding  test  statistic  obtained  with  the  above  output 
of  the  prefilter  is  plotted  in  Figure  11.  It  is  evident  that 
the  test  statistic  performs  quite  well  in  identifying  the 
occurence  of  the  acoustic  emission. 


X  10"“ 


Figure  11.  Test  statistic  (Equation  4) 

5  Conclusions 

Some  preliminary  results  from  our  investigation  into 
AEs  as  a  viable  technique  for  predictive  diagnostics 
have  been  presented.  HP  filtering  and  LP  filtering  ap¬ 
proaches  have  been  considered  for  prefiltering.  Two 
possible  detection  methods  have  also  been  considered. 
The  choice  of  the  technique  to  be  used  for  prefiltering 
and  detection  is  heavily  dictated  by  the  specific  appli¬ 
cation  under  consideration. 

Classification  and  crack  localization  are  possible  fu¬ 
ture  directions  in  our  research. 
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Abstract 

Developing  fast  and  robust  methods  for  identifying 
multiple  FIR  channels  driven  by  an  unknown  common 
source  is  important  for  wireless  communications.  In  this 
letter,  we  present  a  new  method  that  exploits  a  mimnuun 
noise  subspace  (MNS).  The  MNS  is  computed  from  a  set 
of  channel  output  pairs  wfuch  form  a  "tree".  The  "tree" 
exploits  with  minimum  redundancy  the  diversity  among 
all  channels.  The  MNS  method  is  much  more  efficient  in 
computation  and  only  slightly  less  robust  to  channel  noise 
than  the  subspace  method  by  Moulines  et  al. 


1.  Introduction 

Blind  identification  of  multiple-channels  FIR  system 
driven  by  a  common  source  has  recently  received  much 
attention  due  to  its  potential  applications  in  wireless 
cominunications.  In  contrast  to  the  traditional  cost- 
function  based  adaptive  approaches  and  the  more  recent 
higher  order  statistics  (HOS)  based  methods,  the  second 
order  statistics  (SOS)  b^ed  methods  appear  to  be  a  "hot" 
topic  in  this  community,  e.g.,  see  [2].  Apparently,  this 
trend  started  from  the  work  by  Tong  et  al  [3].  Among 
many  SOS  based  methods  known  so  far,  the  subspace  (SS) 
method  by  Moulines  et  al  [1]  is  an  outstanding  one.  The 
SS  method  applies  the  MUSIC  concept  to  a  relation 
between  the  channel  impulse  responses  and  the  noise 
subspace  associated  with  a  covariance  matrix  of  die  system 
ouQHit.  In  this  p^r,  we  present  a  new  variation  of  the  SS 
method.  Instead  of  exploiting  the  full  noise  subspace,  this 
new  method  exploits  a  minimum  noise  subspace  (ND^S), 
The  MNS  method  represents  a  solid  extension  of  an 
observation  made  by  Moulines  et  al  [1]  that  the  full  noise 
subspace  of  the  system  output  covariance  matrix  is 
generally  not  necessap^  to  asymptotically  yield  the  unique 
(up  to  a  constant)  estimate  of  channel  responses.  We  will 
show  that  the  minimum  dimension  of  the  noise  subspace 
required  for  unique  system  identification  is  M-l  where  M 
is  the  number  of  FIR  channels,  and  each  of  the  required  M- 
1  noise  vectors  can  be  computed  from  one  of  M-l 
covariance  matrices  corresponding  to  properly  chosen  M-l 


(distinct)  pairs  of  channel  ouq}uts.  Any  M-l  pairs  of 
channel  outputs  that  span  a  "tree"  pattern  (Figure  1)  are  a 
proper  choice.  ITje  MNS  method  is  much  more  efficient  in 
computation  than  the  SS  method.  Simulations  have 
shown  that  the  MNS  method  is  only  slightly  less  robust 
to  channel  noise  than  the  SS  method. 

2.  Channel  model  and  the  SS  method 

We  cmisider  M  parallel  FIR  channels  driven  by  a  common 

source.  The  ouq)ut  vector  of  the  i*  channel  can  be  written 
as 

y.(«)  =  H,.s(/i)  +  w..(n) 

where 

y.C/i)  =  [y.(/i)  y.(/i+i)  ...  y,(n-HV-i)f 
s(n)  =  [^(/t - L)  i(/i-L-fl)  •••  s(n  +  N-l)f 

w,.(/i)  =  [w.(/i)  iv,(n  +  l)  ...  w,(/i-l- V-l)f 

h-{L)  •••  hfiO)  0  •••  0 

0  h,(L)  ...  /i.(0)  ...  0 

•  •  •  •  •  •  , 

•  •  •  •  •  • 

•  •  •  •  •  • 

.0  ...  0  hi(L)  ...  /i.(0) 

Nx{N+L). 

yi(n)  denotes  the  output  sequence  of  the  ith  channel;  s(n) 
the  input  sequence;  wi(n)  the  noise  sequence  on  the 
channel;  and  hi(k)  the  impulse  response  of  the  i***  channel. 
L  denotes  the  maximum  order  of  the  M  channels;  and  N 
the  window  length  on  each  channel  output  Then  we  write 

y(«)  =  Hs(/i)  +  w(/t) 
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where  y(n)  = 


>!(«)' 

’w,(n)' 

• 

• 

.  w(«)  = 

• 

and 


The  matrix  H  is  known  as  MNx(N+L) 


generalized  Sylvester  matrix  [6]  which  has  the  full  column 
rank  N+L  under  the  assumptions:  Al)  Uie  M  channels  do 
not  share  a  common  zero;  and  A2)  Nld^+l.  The  blind 
identification  problem  here  is  to  find  H  firom  die  sequence 
{y(rt)  forn=1.2„..,r).  The  SS  method  [1]  exploits  the 
covariance  matrix  of  all  channel  outputs: 

R  =  —^yin)y(nf  where  ^  denotes  the  conjugate 


transpose.  This  matrix  has  the  inherent  structure: 

J  T 

R,  =  HR,H"  +R^  with  R,  =  -'^sin)sinf  and 


1  ^ 

R^  =  — ]^w(n)w(n)".  The  SS  method  then 

^  H=1 

computes  the  eigendecomposition  of  R  y : 


R,=[u.  u.l 


where  the 


matrix  Un  consists  of  the  MN-N-L  non-principal 
eigenvectors  of  R^.  In  addition  te  the  assumptions  Al- 
A2,  if  A3)  the  source  covariance  matrix  Rs  has  the  full 
rank  N+L,  and  A4)  the  noise  covariance  matrix  Rtv  is 
proportional  to  the  identity  matrix  (which  is  true  when  the 
noise  is  white  and  T  is  very  large),  then  it  can  be  shown 


[1]  that  range(Urt)  is  the  orthogonal  complement  of 
range(H).  Hence.  range(Un)  is  referred  to  as  the  noise 
subspace.  The  SS  method  yields  an  estimate  of  H  by 
solving  the  equatitm  Un^Hg  =  0  in  a  least  square  sense 
(where  Hg  is  subject  to  the  same  structure  as  H).  This 
estimate  is  uniquely  (up  to  a  constant  scalar)  equal  to  H 
under  the  assumptions  A1-A4  [1]. 


3.  The  MNS  method 

In  the  MNS  method,  we  first  select  M-l  distinct  pairs 
from  the  M  channel  ouqiuts  [yi(n),i=l,...M)-  n»e  Af-1 
pairs  must  span  a  "tree"  which  connects  all  M  channel 
outputs.  The  channel  outputs  are  the  "nodes"  of  the  tree  as 
shown  in  Figure  1. 


Figure  1;  This  illustrates  a  "tree"  which  connects  M=5 
channel  ouqjuts  as  its  "nodes".  A  tree  must  have  no  loop 
and  connect  all  its  nodes.  Here,  the  nodes  2,  4,  and  5^  are 
"ending"  nodes,  and  the  nodes  1  and  3  are  "branching" 
nodes.  (The  tree  spanned  by  Af-1  pairs  of  channel  ouq)uts 
is  the  same  as  the  tree  by  Af-1  pairs  of  the  columns  of 

H(z)  in  the  proof  for  Lemma  3.) 


Then  for  each  pair  of  channel  outputs,  we  compute  the 


covariance  matrix 


and 


its  least  dominant  eigenvector  v*’^  .  Let  v’"  — 


r'(i) 


where  each  subvector  has  the  dimension  Nxl.  Then  define 

v‘^(l) 


"zero  padded"  vector  — 


v‘^(Af) 


where 


y‘'Xk)  =  W-Xj) 


r'(i) 


k  =  i 
k  =  j 


Then  we  form  a 


Y'K. 

[  0  otherwise 
MNx(M-l)  matrix  of  the  Af-1  vectors  {v*«/).  Similar 
to  the  SS  method,  the  MNS  method  yields  an  estimate  Hg 


of  H  by  solving  the  equation  Vn^Hg  =  0  in  a  least  square 
sense  (where  Hg  is  subject  to  the  same  structure  as  H). 
The  significant  compuational  advantage  of  the  MNS 
method  over  the  SS  method  is  obvious.  In  particular,  the 
SS  method  requires  a  full  eigendecomposition  of  an 
MNxMN  matrix,  but  the  MNS  method  computes  the 
single  least  dominant  eigenvector  of  a  2Nx2N  matrix  in 
parallel  for  each  of  Af-1  pairs  of  channel  ouqmts. 

We  will  now  establish  that  under  the  assumptions  Al- 
A4,  a)  the  MNS  method  yields  the  unique  estimate  of  H, 
and  b)  Af-1  is  the  smallest  number  of  vectors  from  the 

noise  subspace  in  ordw  for  an  equation  like  V^^Hg  =  0  to 
yield  the  unique  estimate  of  H.  We  note  that  due  to  the 
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limited  space,  the  proofs  shown  below  might  be  too  brief 
for  some  readers. 

Lemma  1  (easy  to  prove):  For  any  equatitm  v^H=0  where 

v  =  [v(l)^  •••  v(Af)^]^  with 

v(0  =  [v.(0)  —  v,(Ar-l)]’'  and  H  is  a 
MNx{N-¥L)  generalized  Sylvester  matrix,  there  uniquely 

M 

corresponds  a  polynomial  equation  ^V^(z)H^(z)  =  0 

i=l 

L 

of  degree /V+L-l,  where  //,.(z)  =  of  degree 

/=0 

N-l 

Land  V^(z)=  y,V;(/)z  *  of  degree  AA-1.  The  converse 
1=0 

is  also  true. 

Lemma  2:  If  there  are  q  MNxl  vectors  {v,-  for  i=l,...,^) 
such  that  {v,-^H=0  for  i=l....,<7)  where  H  is  a 
MNx(N+L)  generalized  Sylvester  matrix,  then  H  is 
(possibly)  unique  up  to  a  constant  scalar  only  if  q^- 

Proof:  Using  Lemma  1,  it  is  straightforward  to  show  that 
{v/^H=0  for  i=l,...,4r)  is  equivalent  to  the  polynomial 

matrix  equation  V(z)h(z)  =  0  of  degree  N+L-\, 
where  V(z)  is  a  qxM  polynomial  matrix  of  degree  N- 1 

uniquely  corresponding  to  {v,  fori=l,..,^}  and  h(z)  is 
an  Afxl  polynomial  vector  of  degree  L  uniquely 
corresponding  to  H.  But  using  the  polynomial  matrix 

theory  [5],  h(z)  is  determined  by  the  equation 

V(z)h(z)  =  0  uniquely  up  to  a  polynomial  (or 
constant)  scalar  only  if  q^-l. 

It  is  easy  to  show  that  under  the  assumptions  A1-A4,  the 
vector  v'd  satisfies  (v*</)^H=0.  Since  the  MNS  method 
only  relies  on  Af-1  noise  vectors.  Lemma  2  has  now 
established  that  the  MNS  method  exploits  a  "minimum" 
noise  subspace. 

Lemma  3:  The  MNS  method  yields  the  unique  (up  to  a 
constant  scalar)  estimate  of  the  channel  responses  under 
the  assumptions  A1-A4. 

Proof:  From  Lemma  1,  the  equation  (\U)Hu=0  is 
equivalent  to  a  polynomial  equation 
Vjiz)Hi(z)  +  Vi(z)Hj(z)  =  0  of  degree  /V+L-l, 
where  Vi(z)  and  Vp)  are  of  degree  N-l  and  Hp)  and 
Hj(z)  are  of  degree  L.  Similarly,  each  sub-equation 


in  the  overall  MNS  estimation  equation 
Vn^Hg=0  is  equivalent  to  a  polynomial  equation 
^(z)//„(z)  +  V^(z)H^j(z)  =  0  where  the  degrees  of 

all  polynomials  are  the  same  as  in  the  previous 
polynomial  equation.  Combining  these  two  polynomial 
equations  yields  Hj(z)H^p)- Hi(z)H^.(z)  =  0. 

Using  this  equation  for  each  of  the  M-l  pairs  of 
channels,  one  can  show  that  the  solution  to  V„^Hg=0 
i^equivalent  tojhat  of  the  polynomial  matrix  equation 
H(z)h^(z)  =  0  of  degree  2L,  where  H(z)  is  an  (Af- 
l)xAf  polynomial  matrix  of  degree  L  uniquely 
corresponding  to  {Hp)  fori=l,...,Af}.and  h^(z)  is  an 
Mxl  polynomial  vector  of  degree  L  uniquely 
corresponding  to  [Hep)  for  i=l,...,W)  (or  equivalenUy 

Hg).  Note  that  each  row  of  H(z)  only  has  two  nonzero 
entries  and  hence  defines  a  pair  of  columns.  The  M-l 

pairs  of  columns  defined  by  the  M-l  rows  of  H(z)  also 

span  a  "tree"  which  connects  all  M  columns  of  H(z)  as 
its  "nodes".  This  tree  is  id^tical  to  the  tree  spanned  by 
the  pairs  of  ch^nel  outputs  (Figure  1).  Because  of  this 

structure  in  H(z),  one  can  show  by  induction  that 
H(z)  has  the  full  row  rank  M-l.  (  Note  that  removing 
a  column  and  a  row  of  H(z)  associated  with  an  "ending 
node"  decreases  the  rank  of  H(z)  by  one,  and  when 
H(z)  is  1x2  its  rank  is  one.)  Therefore,  the  solution 
for  the  Mxl  vector  \(z)  to  the  equation 
H(z)h^(z)  =  0  must  be  unique  up  to  a  polynomial 
scalar  [5].  Furthermore,  since  h(z)  is  a  solution  of 
degree  L  to  H(z)h^(z)  =  0  and  there  is  no  common 

zero  among  all  channels  (Al),  h(z)  must  be  the 
unique  solution  up  to  a  constant  scalar. 

Lemma  3  has  established  that  the  MNS  method  yields 
asymptotically  the  unique  estimate  of  H.  This  section  has 
provided  a  much  stronger  result  than  a  discussion  in  [4]  on 
the  MNS  method. 

4.  Performance  of  the  MNS  method 

In  our  simulation,  we  used  a  system  of  four  (M=4)  parallel 
FIR  channels.  The  first  channel  is  given  by  the  GSM  test 
channel  [7]  with  6  (L=5)  delayed  paths.  The  other  three 
channels  are  generated  by  assuming  a  plane  propagation 
mc^el  for  each  path  with  corresponding  electric  angles 
uniformly  distributed  in  [0,  jr/3].  A  realization  of  the 
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channel  impulse  responses  is  shown  in  the  table  shown 
below.  The  output  observation  noise  is  an  i.i.d.  sequence 
of  zero-mean  Gaussian  variables.  The  input  signal  is  an 
i.i.d.  sequence  of  zero-mean,  unit-variance  QAM-4 
variables  independent  from  the  noise.  The  performance  is 


measured  by  MSE(dB)  =  lOlogjo-j 


where  Nr  is  the  number  of  independent  runs  (Nr=lOO),  h 
is  the  true  (unit-norm)  vector  of  the  impulse  responses 
[hi(k)  for  1=1, ...M  and  it=0,...X}.  hr  is  the  estimated 

(unit-norm)  vector  of  impulse  responses  at  the  ^in. 
(The  equation  Vn^He=0  was  solved  subject  to  ||h,||  =  1. 


For  each  run,  hr=ahe  where  a=he^h  is  a  phase  adjuster.) 
The  signal-to-noise  ratio  is  defined  as 


SNR(dB)  =  201ogio 


where  and 


denote  the  deviations  of  the  input  and  the  noise 
respectively.  Figure  2  compares  the  performances  of  the 
SS  and  NMS  methods.  This  figure  (associated  with  the 
case  defined  by  the  table)  is  quite  typical  among  all  the 
cases  that  we  considered  in  our  simulation.  In  the 
operational  region  where  the  MSE  is  relatively  small,  the 
MNS  method  required  SNR  no  more  than  3  dB  higher  than 
the  SS  method  to  yield  a  given  value  of  MSE. 
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Figure  2:  Performance  comparison  of  the  SS  method  and  the 
MNS  method.  MSE  versus  SNR. 
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hi(k) 

h2(k) 

h3(k) 

h4(k) 

k=0 

0.4972-1.27841 

1.3516-0.23331 

0.8970+1.03771 

-0.4264+1.30371 

k=l 

-0.0370+0.72561 

-0.5251+0.50211 

0.7265+0.00461 

-0.5314-0.49551 

1.4158+0.27681 

0.8012+1.19961 

-0.2867+1.41381 

-1.2052+0.79271 

0.6417+0.44401 

0.2181+0.74921 

-0.3031-1^.71901 

-0.6886+0.36691 

-1.2418+0.59841 

-1.2837+0.50231 

-1.3182-^0.40321 

-1.3450+0.30181 

psm 

0.0235+1.17771 

-0.7049+0.94381 

-1.1360+0.31181 

-1.0879-0.45181 

Table:  Impulse  responses  of  the  M-channel  system.  Other  parameters  are  M-4,  L—5,  N—6,  T=245. 
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ABSTRACT 

This  paper  is  concerned  with  the  problem  of  blind  sep¬ 
aration  of  independent  signals  (sources)  from  their  linear 
convolutive  mixtures.  The  problem  consists  of  recovering 
the  sources  up  to  shaping  filters  from  the  observations  of 
MIMO  system  output.  The  various  signals  are  assumed  to 
be  linear  but  not  necessarily  i.i.d.  (independent  and  identi¬ 
cally  distributed).  An  iterative,  normalized  higher-order 
cumulant  maximization  based  approach  is  developed  us¬ 
ing  the  third-order  and/or  fourth- order  normalized  cumu- 
lants  of  the  “beamformed”  data.  The  approach  is  source- 
iterative,  i.e.,  the  sources  are  extracted  (at  each  sensor) 
and  cancelled  one-by-one.  The  proposed  solution  provides 
a  decomposition  of  the  given  data  at  each  sensor  into  its  in¬ 
dependent  signal  components.  The  proposed  approach  is  an 
extension/ application  of  a  recently  proposed  approach  for 
MIMO  system  identification  where  the  system  is  driven  by 
unobserved  i.i.d.  inputs. 

1  Introduction 

Given  measurements  yi{k),  (i  =  1,  2,  ■  •  ■ ,  N),  at  time  k  at 
N  sensors,  let  these  measurements  be  a  linear  convolutive 
mixture  of  M  source  signals  aij(fc),  (j  =  1,  2,  •  •  • ,  M): 

M 

ViW  =  (1-1) 

J  =  1 

y{k)  =  G{q-^)K{k),  (1-2) 

where  ij—ih.  element  of  G(g“^)  is  Gij{q~^),  y{k)  = 

[yi(^) :  ■  • '  similarly  for  x(A:),  q~^  is  the 

backward-shift  operator  (i.e.,  g“^a;(A:)  =  x{k  —  1),  etc.), 
is  the  j-ih.  input  at  sampling  time  k,  yi(k)  is  the  i-th 
output,  and 


:=  (1-3) 

l=  —  oo 

is  the  scalar  transfer  function  with  a;j(ib)  as  the  input  and 
yi{k)  as  the  output.  We  will  also  use  the  notation 

Gij(z)  ;=  Gi,(g“^)|,=^  =  (1-4) 


This  work  was  supported  by  the  National  Science  Founda¬ 
tion  under  Grant  MIP-9312559. 


the  2^-transform  of  the  sequence  We  allow 

all  of  the  above  variables  to  be  complex-valued. 

We  wish  to  design  a  MIMO  dynamic  system  E(g”^) 
with  N  inputs  and  M  outputs  such  that  the  overall  MxM 
system 

T(g-^)  :=  E(r^)G(g-‘)  (1-5) 

decouples  the  source  signals.  Following  the  2x2  case  con¬ 
sidered  in  [7],  this  implies  that  we  must  have 

=0  for  i  /  ij 
/  0  for  X  =  ij 

where  i  -  1,2,-..,M;  j  =  1,2,  and  ij  G 

{1,  2,  •  ■  • ,  M}  such  that  ij  /  ii  for  j  /  1.  That  is,  in 
every  column  and  every  row  of  T(g“^)  there  is  exactly 
one  non-zero  entry.  In  a  blind  separation  problem,  the 
nonzero  entries  of  T(g  ^)  are  allowed  to  be  a  scalar  linear 
system  (shaping  filter),  unlike  the  equalization  problems 
where  they  must  be  constant  gains. 

The  problem  considered  above  arises  in  a  wide  variety 
of  signal  processing  and  communications  applications;  see 
[l]-[8],  and  references  therein.  One  obvious  application  is 
array  signal  processing  where  the  array  manifold  may  be 
unknown  or  imprecisely  known  [5].  Separation  of  sources 
differs  from  blind  equalization  [9], [10], [13], [14], [17]  in  that 
the  source  signals  are  not  necessarily  i.i.d.  (independent 
and  identically  distributed).  In  this  paper  we  zdlow  N  >  M 
{N  =  number  of  sensors,  M=  number  of  sources)  with  M 
arbitrary,  whereas  quite  a  few  existing  papers  are  restricted 
to  M  =  W  =  2  ([1],[7],[8])  OT  M  =  N  ([2],[3],[17]).  Our 
proposed  approach  has  aspects  that  follow  from  [14]  (see 
also  [16]),  yet  our  approach  is  more  general  in  that  it  ap¬ 
plies  to  signals  with  nonzero  normalized  fourth  cumulant 
whereas  [14]  and  [16]  are  restricted  to  signals  with  negative 
normalized  fourth  cumulant.  Moreover,  [16]  deals  with  in¬ 
stantaneous  mixtures  of  a  restricted  type  and  [14]  deals 
with  blind  equalization,  not  source  separation. 

2  Model  Assumptions 

We  impose  the  following  conditions: 

(ASl)  N  >  M ,  at  least  as  many  outputs  as  inputs. 

(AS 2)  The  various  components  of  x(A;)  are  mutually 
independent  and  the  coupling  system  is  stable. 
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(ASS)  x{k)  is  linear,  i.e. 

x(it)  =  F(g-^)w(fc),  (2-1) 

where  w(ife)  is  a  zero-mean,  M— vector  station¬ 
ary  non-Gaussian  process,  temporally  i.i.d.  and 
spatially  independent,  with  nonzero  fourth  cu- 
mulants.  Because  of  (AS2)  we  may  take  F(g“"^) 
to  be  diagonal.  Assume  also  that  the  composite 
system 

y{k)  =  G(g-')F(g-')w(ifc)  =:  B(g-')w(fc), 

(2  -  2) 

is  stable.  Let  B{z)  denote  the  transfer  function 
B(g-')  in  the  ^-transform  notation.  Assume 
that  rank{B(2;)}  =  M  for  any  \z\  =  1. 

We  will  denote  the  ij—tYi  element  of  B(g  )  is  ). 

3  A  Solution 

Let  CUM4(ii;)  denote  the  fourth-order  cumulant  of  a 
complex- valued  random  variable  it;,  defined  as 

CUM4(it;)  :=  c}imi{w ^  w* ^  w ^  w*}  (3  —  1) 

=  E{\w\^}  -  2[E{\w\^}f  -  \E{w^}\\ 

(3-2) 

We  will  use  the  notation  74u/t  =  CUM4('u;i(^))  s-iid  cr^i  — 
E{\wi{k)\^}.  Consider  an  1  x  JV  row-vector  polynomial 
equalizer  C^(g”^),  with  its  j-th  entry  denoted  by  Cj(g”^), 
operating  on  the  data  vector  y(A;).  Let  the  equalizer  out¬ 
put  be  denoted  by  e{k).  We  then  have 

N 

e{k)  =  5]Ci(g-')yi(fc) 

t=l 

N  M  ^ 

=  £^^Ci(g-')Bi,(g-')«;,(fc)  = 
t=i  j=i 

(3-3) 

where 

(3-4) 


for  designing  a  linear  equalizer  to  recover  one  of  the  inputs. 
It  can  be  shown  [13]  that 

<  l74maffil  (3“'^) 


with  equality  iff 

hj{k)  =  d6{k-ko)6{j-jo),  io  e  {1,2,...,M},  (3-8) 

where  d  is  some  complex  constant,  ko  is  some  integer,  jo 
indexes  some  input  out  of  the  given  M  inputs  such  that 
l%,-0  =  l74maJ.  and  fi(fc-feo)  =  1  if  =  fco,  =  0  otherwise. 

Thus,  (3-3)  reduces  to 

e{k)  —  dwjQ{k  —  ko),  (3  —  9) 

i.e.,  the  equalizer  output  is  a  possibly  scaled  and  shifted 
version  of  one  of  the  system  inputs.  It  has  been  established 
in  [13]  that  under  (AS1)-(AS3),  such  a  solution  exists 
and  all  locally  stable  stationary  points  of  the  given  cost 
w.r.t.  the  combined  composite  channel-equalizer  impulse 
response  {/i.j(fc)}-  are  characterized  by  solutions  such  as 
(3-8)  and  (3-9).  Moreover,  if  doubly-infinite  equalizers  are 
used  then  all  locally  stable  stationary  points  of  the  given 
cost  w.r.t.  the  equalizer  coefficients  are  also  characterized 
by  solutions  such  as  (3-8)  and  (3-9). 

The  above  discussion  suggests  an  iterative  solution 
where  we  iterate  on  input  sequences  one-by-one.  Max¬ 
imization  of  (3-6)  w.r.t.  the  equalizer  CJ(g  )  leads  to 
the  solution  (3-9)  under  the  sufficient  conditions  (ASl)- 
(AS3).  Given  (3-9)  we  can  estimate  and  remove  the  con¬ 
tribution  of  u;jo(/b)  from  (l-l)-  Then  we  have  a  MIMO  sys¬ 
tem  with  N  outputs  but  M  —  1  inputs  (instead  of  M  inputs 
as  in  (l-l)-(l-2)).  Repeat  the  process,  i.e.,  maximize  (3-6) 
w.r.t.  a  new  equalizer  to  get  a  solution  e{k)  =  d'wji^{k  —  kQ) 
where  jo  G  ({1,  2,  ■  •  • ,  M}  -  {jo}).  That  is,  we  follow  the 
following  procedure. 

Step  1.  Maximize  (3-6)  w.r.t.  the  equalizer  C(g“^)  to 
obtain  (3-9). 

Step  2.  Cross-correlate  {e{k)}  (of  (3-9))  with  the  given 
data  (l-l)-(2-2)  and  define  a  possibly  scaled  and 
shifted  estimate  of  hij^^r)  as 


-  E{yi{k)e*{k  -  r)} 

—  E{\e{k)\^} 


(3-10) 


so  that 

H(g-^)  :=  C"’(g-^)G(g-^)F(g-^). 


Consider  now  the  reconstructed  contribution  of 
e{k)  to  the  data  yi{k)  (i  =  1,  2,  ■  •  • ,  M),  denoted 

by  yi,joW‘ 


In  general,  we  have 

=  E  (3-5) 

fc=  — oo 


Define  hj(^k)  =  (Twjk^jik),  f s-^^d  |74maa!l  • 

maxi<j<M|74il- 


As  in 
cost 


[13],  we  propose  to  consider  maximization  of  the 


|CUM4(e(fe))| 

[F{|e(fe)P}]=' 


(3-6) 


(3-11) 

I 

Step  3.  Remove  the  above  contribution  from  the  data 
to  define  the  outputs  of  a  MIMO  system  with 
N  outputs  and  M  —  1  inputs.  These  are  given 
by 

y'iik)  :=  yi{k)-yij,{k).  (3-12) 

Step  4.  If  M  >  1,  set  M  ^  M  -  1,  yi{k)  <-  2/-(fc),  and 
go  back  to  Step  1,  else  quit. 
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In  practice,  all  the  expectations  in  (3-10)  are  replaced  with 
their  sample  averages  over  appropriate  data  records. 
Analyzing  the  above  algorithm  we  have 

M 

E{yi{k)e*{k-T)}  = 

i  =  l 


=  6tjo(*o  +  r)d*c7-^ 

Using  (3-13)  in  (3-10)  we  have 
(r)  = 

I  •  win 


(3-13) 

i.'io  (*o  +  T)/d  . 

(3  -  14) 


It  follows  from  (3-11)  and  (3-14)  that 

yiJo(k)  =  ^  «>iio(0“lo(*  -  0-  (3  -  15) 

I 

Now  use  (3-12)  and  (3-15)  to  deduce  that 


y'iW  =  ^  Bij(z)wj{k),  1  =  1,2,  (3-16) 

i=i.jVio 


It  is  seen  that  we  have  decomposed  the  observations 
at  the  various  sensors  into  its  independent  components: 
ViJoW  in  (3-11)  represents  the  contribution  of  {®jo(A:)}  to 
the  i— th  sensor.  Eqn.  (3-11)  represents  an  embarrassment 
of  riches:  we  have  a  large  class  of  solutions  to  the  problem 
of  blind  separation  of  convolutive  mixtures.  We  empha¬ 
size  that  in  our  solution  knowledge  of  F(5“^),  G(g“*^)  or 
B(g”^)  has  not  been  assumed.  Our  solution  is  guaranteed 
to  converge  unlike  that  of  [7]. 

Remark  1.  We  may  replace  the  cost  (3-6)  with  [13] 


_  |CUM3(e(fe))| 

where 


(3  -  17) 


CUM3(iu)  :=  cum4{t/^,  u;*,  ly}  =  E{\w^w}.  (3  -  18) 

The  preceding  discussion  pertaining  to  (3-6)  holds  in  this 
case  with  obvious  modihcations  provided  we  replace  the 
phrase  “nonzero  fourth  cumulants”  in  (ASS)  with  the 
phrase  “nonzero  third  cumulants.”  □ 

Remark  2.  It  follows  from  the  preceding  developments 
that  under  the  conditions  (AS1)-(AS3),  the  proposed  iter¬ 
ative  approach  is  capable  of  blind  identification  of  a  MIMO 
transfer  function  B{z)  up  to  a  time-shift,  a  scaling  and  a 
permutation  matrix  provided  that  we  allow  doubly-infinite 
equalizers.  That  is,  given  B{z)^  we  end  up  with  a  A{z) 
where  the  two  are  related  via 


A{z)  =  B{z)BAP  (3  -  19) 

where  D  is  an  Af  x  AT  “time-shift”  diagonal  matrix  (recall 
ko  in  (3-8^,  A  is  an  AT  x  Af  diagonal  scaling  matrix  (recall 
d  in  (3-8)1,  and  P  is  an  Af  x  Af  permutation  matrix  (recall 
Jo  in  (3-8),  we  don’t  “know”  which  input  it  refers  toY  See 
also  [13].  □ 


4  Simulation  Example 

Here  we  consider  a  2-input  3-output  MA(6)  system 
model  resulting  in  ^=3  and  Af=2  in  (2-2).  Its  3  x  2 
transfer  B(z)  in  (^-2)  is  given  by  denotes  the 

i-th  column  of  B(g  ^ )) 


Fi(g-^) 


0.7426  -h  0.7426g~^ 
0.4456g“^  H-  0.7426g'-2 
0.8911g~^  -h  0.5941g“® 


(4-1) 


F2(r^) 


(0.5678  -h0.3407g-^) 

(-0.23855-“^  -  0.5678g"^  -h  0,8176g'’^ 
+0.4088g"^  H-0.2385g“®) 
(0.6814g*-^  +0.9085g-^) 


(4-2) 

The  inputs  {wj{k)}  {j  =  1,  2)  in  (2-1)  and  (2-2)  are  mutu¬ 
ally  independent,  zero-mean  i.i.d.,  4- QAM  sequences  tak¬ 
ing  values  ±l±j  with  probability  0.25  each.  The  additive 
noise  at  the  various  sensors  was  complex  (circularly  sym¬ 
metric)  zero-mean  white  Gaussian  with  identical  variance. 
The  equalizer  length  was  chosen  to  be  15  taps,  i.e.,  with 
JV  =  3,  (z  =  1,2,3)  have  15  taps  each.  The  ini¬ 

tial  guess  for  optimization  of  J42  was  always  taken  to  be 
center-tap  initialization,  i.e.  we  took  ci(7)  =  1  with  the 
remaining  taps  Ci{k)  {i  =  1,  2,  3)  set  to  zero.  For  the  pur¬ 
pose  of  impulse  response  estimation  and  extracted  signal 
cancellation  (see  steps  2  and  3  in  Sec.  3),  &ijo(T)  was  esti¬ 
mated  for  -20  <  r  <  20  (see  (3-10)  ). 

It  is  clear  from  (3-11)  that  how  well  one  estimates  the 
channel  impulse  response  strongly  influences  how  well  one 
can  separate  the  given  observations  into  their  constituent 
independent  components.  Therefore,  we  will  take  accuracy 
in  impulse  response  estimation  as  a  performance  measure. 
In  order  to  assess  the  performance  of  the  proposed  ap¬ 
proach,  one  first  needs  to  remove  the  ambiguities  associ¬ 
ated  with  the  matrices  D,  A  and  P  in  (3-19).  This  was 
accomplished  by  aligning  (via  cross-correlation  and  shift¬ 
ing)  the  estimated  impulse  responses  with  their  true  coun¬ 
terparts  and  by  scaling  them  to  have  a  fixed  norm.  For 
instance,  the  true  model  (4-l)-(4-2)  is  such  that 


3  6 

X]  XI  =  3  for  i  =  l,2.  (4-3) 

fc=0 


We  chose  to  truncate  the  estimated  impulse  responses  to 
12  samples  (after  proper  alignment  with  the  true  impulse 
responses);  this  is  much  longer  than  the  true  length  of  7. 
The  estimated  impulse  response  hij(k)  after  truncation  was 
normalized  in  a  manner  similar  to  (4-3): 

3  10 

X  X  ^  j  =  f  >  2-  (4  -  4) 

t=l  k=~l 

We  will  use  the  normalized  mean-square  error  (NMSE)  in 
estimating  the  channel  impulse  responses  as  a  performance 
index.  The  length  of  each  subchannel  (Btj(g~^)  (i,  j  £ 
{1>2}))  was  restricted  to  12.  For  Me  Monte  Carlo  runs, 
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I 

► 


the  NMSEtj  for  subchannel  Bij{q  is  defined  as 


NMSEij 


(4-5) 

where  denote  the  estimate  of  the  tj-th  subchannel  im¬ 
pulse  response  for  the  l-ih.  Monte  Carlo  run.  The  overall 
NMSE  (called  ONMSE)  is  obtained  by  averaging  over  all 
subchannels: 


N  M 

ONMSE  =  ^NMSEij.  (4-6) 

i=l  3=1 

Table  I  shows  the  various  NMSE’s  for  different  SNR's  and 
record  lengths  for  using  the  cost  J42.  It  is  seen  that  the 
proposed  method  works  well  even  for  rather  low  average 
SNR  of  13  dB. 


TABLE  I.  Normalized  mean^square  error  (4-^) 
estimating  the  system  matrix  channel  impulse  response. 
4-QAM  (complex-valued)  inputs  and  cost  J42.  50  Monte 
Carlo  runs,  equalizer  length  =15  taps  (per  subchannel) 


]  Table  I  [1 

Record 

Length 

33  dB 

Sb 

23  dB 

IR 

13  dB 

3  dB 

750 

0.0161 

0.0167 

0.0271 

0.5201 

1500 

0.0080 

0.0082 

0.0102 

0.5091 

3000 

0.0039 

0.0040 

0.0053 

0.1973 

5  Conclusions 

The  problem  of  blind  separation  of  independent  linear 
signals  (sources)  from  their  linear  convolutive  mixtures  was 
considered.  An  iterative,  normalized  higher-order  cumu- 
lant  maximization  based  approach  was  developed  using  the 
third-order  and/or  fourth-order  normalized  cumulants  of 
the  ‘"beamformed”  data.  The  approach  is  source-iterative, 
i.e.,  the  sources  are  extracted  (at  each  sensor)  and  can¬ 
celled  one-by-one.  The  proposed  solution  provides  a  de¬ 
composition  of  the  given  data  at  each  sensor  into  its  inde¬ 
pendent  signal  components.  The  proposed  approach  is  an 
extension/ application  of  a  recently  proposed  approach  for 
MEMO  system  identification  where  the  system  is  driven  by 
unobserved  i.i.d.  inputs. 
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Abstract 

Vi^en  a  priori  information  about  the  propagation  or  the 
geometiy  of  the  array  are  not  available,  the  model  can  be 
generalized  to  a  blind  source  separation  problem  It 
supposes  the  statistical  independence  of  the  sources  and 
their  non-gaussianity.  In  this  paper,  the  observed  signals 
are  supposed  to  be  convolutive  mixtures  of  wide-band 
sources.  Several  criteria  of  source  separation  are  studied, 
which  are  based  on  the  cancellation  of  different  fourth- 
order  cross-cumulants.  For  these  criteria,  we  show  in 
which  conditions  the  separation  is  achieved.  Results  on 
real  data  illustrate  the  proposed  methods, 

1.  Introduction 

The  problem  of  separating  a  mixture  of  several 
independent  signals  is  encountered  in  many  fields  :  in 
digital  communication  multipath  channels,  in  speech 
enhancement  (cocktail  party  problem),  or  in  the  diagnostic 
of  rotating  machines.  Several  methods  have  been  recently 
proposed  in  [1]  [2]  [3]  [4].  The  problem,  generally  called 
“blind  source  separation’’,  consists  in  identifying  p 
independent  and  non-gaussian  sources  from  M  observed 
linear  mixtures  of  these  sources.  These  techniques  are 
necessary  when  tlie  propagation  between  sources  and 
sensors  cannot  be  modelled  (unknown  paths,  unknown 
antenna  deformation,  complicated  array  geometry,  or 
unavailable  hypothesis  of  plane  waves...). 

Several  methods  [1]  [2]  [3]  [4]  [5]  have  been  developed  in 
time  domain  in  the  case  of  linear  instantaneous  mixtures, 
using  higher-order  statistics  (usually  fourth-order 
moments  or  cumulants,  or  non  linear  functions  of  the 
observations)  or  using  a  deflation  approach.  In  the 
frequency  domain,  several  methods  based  on  the  cross¬ 
bispectra  or  the  trispectra  of  the  estimated  sources  have 
been  proposed  in  [6]  [7]  [8].  [9]  and  [10]  propose  an 
adaptive  approach  in  the  time-domain  in  the  case  of 
convolutive  mixtures.  [11]  uses  a  priori  information  on 
the  probability  densities  of  the  sources.  In  a  general  blind 
source  separation  problem,  the  observed  data  vector  r(t) 
may  be  represented  in  frequency-domain  by  an 
instantaneous  complex  mixture  for  each  frequency  bin  f.  It 
leads  to  the  following  model: 

(1)  Rk(f)  =  A(f)  SkO  +  Bk(f) 


where  R^^(0  is  the  N-point  Discrete  Fourier  Transform 
ODFT)  of  the  kth  data  block  of  the  observation  r(t). 
represents  the  p  sources  vector  and  A(f)  is  an  nnknown 
matrix  (M.p)  which  characterizes  the  linear  propagation 
from  sonrees  to  sensors.  Bk(f)  represents  an  additive  M- 
dimensional  ganssian  noise.  The  problem  consists  in 
identifying  the  matrix  A(f)  as  a  product  of  three  matrices ; 
(2)  A(f)=V(f)A(f)n(f) 

The  matrices  V(f)  (a  unitary  matrix)  and  A(f)  (a  diagonal 
matrix)  are  identified  thanks  to  second-order  criteria,  by 
eigenvalue  decomposition  of  the  covariance  matrix  of 
Rk(f).  After  this  first  usual  step  using  only  second-order 
moments  (developed  in  §2),  we  suppose  that  the 
components  of  the  observations  are  normalized  and 
uncorrelated,  which  is  not  a  restrictive  assumption. 

We  focus  then  in  this  paper  on  the  identification  of  the 
matrices  n(f)  (which  are  unitary  matrices)  thanks  to 
fourth-order  criteria.  In  the  case  of  instantaneous 
mixtures,  two  methods  have  been  already  proposed  [1] 

[2] .  We  focus  in  this  paper  on  the  generalization  of  the 
source  separation  problem  to  convolutive  mixtures  of 
wide-band  sources.  C.Jutten  proposes  in  [9]  to  cancel 
certain  fourth-order  cross-cumulants.  The  separation  is 
only  proved  under  the  condition  of  independent, 
identically  distributed  (i.i.d.)  processes  with  the  same  sign 
of  kurtosis. 

In  this  paper,  we  study  several  criteria  based  on  the 
cancellation  of  other  fourth-order  cross-cumulants  of  the 
estimated  sources  and  we  show  in  which  conditions  the 
separation  is  achieved. 

2.  Modelization  of  the  problem 

The  first  step  consists  in  the  identification  of  the  matrices 
V(f)  and  A(f)  (which  may  be  adaptively  computed).  The 
signals,  noted  Xk(f),  issued  from  the  projection  of  the 
observations  Rk(f)  in  the  signal  subspace  (which  is 
spanned  by  the  columns  of  V(f)  associated  to  the 
dominant  eigen  values  of  the  covariance  matrix  of  the 
observations)  are  uncorrelated  and  normalized.  The  p 
normalized  sources,  called  NSk(f),  are  relied  to  the  p  new 
data,  Xk(f)  by : 

(3)  NS>^f)  =  (n(f)  P(f)  D(f))'^Xk(f) 
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where  P(f)  is  a  (p.p)  permutation  matrix  and  D(f)  is  a 
diagonal  one.  The  notation  '+'  means  transpose  and 
conjugate. 

The  unitary  matrix  n(f)  [2]  can  be  decomposed  into  a 
product  of  Givens  rotations.  In  the  case  of  two  sources, 
for  example,  n(f)  is  a  function  of  two  angles  9(f)  and 
<t)(f),  at  frequency  bin  f.  The  modelization  of  the 
normalized  sources  NSi(f)  and  NS2(f)  after  the  second- 
order  step  is  the  following: 

(4) NSi(f)=cos(0(f))Xi(f)-tsm(e(f))exp(-j(l)(f))X2(f) 

(5) NS2(f)=sin(0(f))exp(j(l)(f))Xi(f)  -  cos(0(f))X2(f) 

In  the  time-domain,  the  p  normalized  sources  mCO  are 
expressed  as  convolutive  mixtures  of  the  p  new  data  x(t): 

(6)  nsdt)  =  X  bii(t)*Xi(t)  j=l . P 

■’  i=l,p 

where  xi(t)  is  the  i-th  component  of  the  vector  x(t)  and 
nsj(t)  is  the  j-th  component  of  the  vector  ns(t).  The 
vectors  hij(t)  represent  the  finite  impulse  responses  of  the 
filters  between  the  i-th  component  of  the  data  x(t)  and  the 
j-th  normalized  source.  They  are  exactly  the  N-point 
inverse  Discrete  Fourier  Transforms  of  the  filters, 
characterized  in  frequency-domain  by  the  N  matrices  (n(f) 
P(f)  D(f))'''  .  In  the  case  of  two  sources,  the  impulse 
responses  of  the  filters  hij(t)  are  the  inverse  Discrete 
Fourier  Transform  of  [cos(0(f))]  and  [sin(0(f))exp(-j(j)(f))] 
for  (f=0,  ....  N-1). 

3  Independence  criterion 

Thanks  to  the  previous  step  (at  second  order),  the 
components  of  x(t)  are  normalized  and  uncorrelated.  As 
the  information  provided  by  the  second-order  statistics  is 
not  sufficient  to  identify  the  N  matrix  (n(f)  P(f)  0(0)''’’ 
we  use  an  additive  assumption  :  the  statistical 
independence  of  the  sources.  The  aim  of  the  second  step 
of  blind  source  separation  procedure  is  the  identification 
of  the  vectors  hij(t)  (or  the  matrices  (n(f)  P(f)  D(f))‘'') 
such  that  the  estimated  sources  nsj(t)  are  independent.  In 
that  case,  if  the  sources  nsi(t)  and  nsj(t)  are  statistically 
independent,  we  must  have  the  cancellation  of  each  cross- 
cumulant  for  any  delay  k,  1,  m,  n  inferior  to  N, 

(7) C22(nsi(t-k),nsi(t-l),nsj(t-m),nsj(t-n)),C31(nsi(t-k), 
nsi(t-l),nsi(t-m),nsj(t-n)),Cl3(nsi(t-k),nsj(t-l),nsj(t-m), 

nsj(t-n)).  C  represents  the  fourth-order  cumulant  as  defined 
in  [12].  It  leads  in  the  strict  sense  to  a  fourth-order 
ind^ndence. 

In  the  case  of  instantaneous  mixtures,  P.Comon  first 
proposed  to  maximize  a  contrast  function  based  on  the 
kurtosis  of  the  estimated  sources  [2],  such  that  the 
maxima  are  obtained  for  solutions  which  acmally  separate 
the  sources.  E.Moreau  and  O.Macchi  proposed  in  [3]  an 
approach  based  on  the  adaptive  maximization  of  others 
contrast  functions,  using  the  kurtosis  and  the  fourth-order 
CTOss-cumulants  of  the  estimated  sources. 


In  the  case  of  convolutive  mixtures  of  wide-band  sources, 
the  purpose  of  this  paper  is  to  study  several  criteria  and  to 
show  in  which  conditions  the  separation  is  achieved. 
C.Jutten  proposes  in  [9]  to  cancel  the  symmetrical  fourth- 
order  cross-cumulants  C2?(nsi(t),nsi(t),nsj(t-k),nsj(t-k)), 
functions  of  one  delay  k.  He  proves  that  it  is  a  sufficient 
condition  to  separate  two  sources  under  the  hypothesis  of 
independent,  identically  distributed  (i.i.d.)  processes  with 
the  same  sign  of  kurtosis.  The  cancellation  of  the  two 
others  cross-cumulants  Cl3(nsi(t),nsj(t-k),nsj(t-k),nsj(t- 
k))  and  C31(nsj(t),nsi(t-k),nsi(t-k),nsi(t-k))  are  not 
applicable  because  spurious  solutions  exist. 

3.1  Study  of  the  criteria 


In  [9],  the  case  of  one  delay  is  dealt  with.  We  propose  in 
this  section  to  study  the  cancellation  of  two 
dissymmetrical  aoss-cumulants,  functions  of  two  delays 
kandl : 

(8)  Cl3(nsi(t-k),  nsj(t),  nsj(t),  nsj(t-l))  =0 
C31(nsi(t),  nsi(t),  nsi(t-k),  nsj(t-l))  =  0 

As  the  estimated  sources  nsi(t)  and  nsj(t)  are  researched  as 
uncorrelated  signals,  it  leads  to  minimize  the  following 
cost  function  'F : 

N-l  2 

(9) '?  =  SS  (El3(nsi(t-k)  nsj(t)  nsj(t)  nsj(t-l)) ) 

i?tj  k,l=0 


'F  can  also  be  computed  as  : 


N-l 


I  |Bi,j(fU2) 


i5tj  fl,f2=0 


where  Bij  (fl,f2)  represents  the  two-dimensional  Fourier 
Transform  of  E(nsi(t-k)  nsj(t)  nsj(t)  nsj(t-l)),  relative  to 
the  time  variables  k  and  1.  Bij(fl,f2)  is  equal  to 
E{nsi(t)2NSi(fl)NSj(f2)}.  The  function  y  is  then 
minimized  (and  equal  to  zero)  when  By  (fl,f2)  is  canceled 
for  each  frequency  bin  fl  and  f2.  The  cancellation  of  Bij 
(fl,-fl)  leads  to  two  types  of  solutions  :  the  first  one 
achieves  the  separation  while  the  second  one  consists  in 
spurious  solutions.  These  spurious  solutions  only  exist 
in  the  case  of  sources  with  identical  statistical  properties 
at  the  fourth-order.  However,  we  can  show,  computing 
the  value  of  the  function  'P,  that  these  spurious  solutions 
do  not  cancel  it.  Consequently,  the  cancellation  of  'P  (9) 
always  assures  the  separation  of  the  sources. 

In  the  case  of  two  sources,  the  estimated  normalized 
sources  NSi(f)  and  NS2(f)  can  be  expressed  in  function  of 
the  normalized  sources,  called  Sln(f)  and  S2n(f)  with : 

(10) NSi(f)=  Hl’(f)  Sln(f)  -I-  H2'(f)  S2n(f) 

(11) NS2(f)=  Hl"(f)  Sln(f)  +  H2"(f)  S2n(f) 

The  source  separation  is  achieved  at  frequency  bin  f  when 
Hr(f)  and  H2"(f)  (or  H2'(f)  and  Hl"(f))  are  equal  to  zero. 
The  criterion  'P  can  also  be  expressed  in  function  of  9(f) 
and  (t)(f),  or  in  function  of  the  complex  gains  Hi’(f)  and 
Hi"(f).  The  cancellation  of  Bi^2  (fl>-fl)  and  B2,l  (-fl,fl) 
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leads  to  one  or  two  types  of  solutions  in  function  of  the 
sources.  If  the  sources  not  verify  the  following  condition 
at  each  frequency  bin  fl,  relative  to  the  fourth-order 
statistics  of  die  sources:  (12) 

N-l  N-1 

5^E{|Sln(f)p|Sln(fl)p}  =  5^E{|S2n(f)p|S2n(fl)p} 
f=0  f=0 

the  cancellation  of  (Bi^2  (fl.-fl)  +  B2,l  (-fl,fl))  leads  to 
(Hr(fl)Hl"(fl)*=0)  for  each  frequency  bin  fl.  The 
separation  is  then  achieved  at  each  frequency  bin.  If  the 
condition  (12)  is  realized,  several  spurious  solutions  exist 
which  verify : 

(13) |Hr(f)|^=|Hr(f)p=| 

However,  if  we  replace  (13)  in  the  function  \j/,  we  remark 
that  these  spurious  solutions  do  not  cancel  it.  We  also 
can  show  by  studying  the  minima  of  Bj  2  (fl.f2)  and 
^2,1  (fl>f2)  that  the  separation  is  not  independently 
achieved  for  each  frequency  bin  f.  The  sources  associated 
to  the  identified  signals  are  necessarily  the  same  from  one 
frequency  bin  to  another.  Consequently,  the  source 
separation  is  assured  with  the  minimization  of  the 
criterion  y,  function  of  two  delays  k,l,  or  two  frequency 
bins  fl,  f2. 

3.2  Case  of  different  sources 

In  the  case  of  two  different  enough  sources  (relative  to 
their  fourth-order  statistics),  we  deduce  from  §3.1  a 
simplificated  criterion  y  which  achieved  the  separation  at 
each  frequency  bin.  It  cancels  the  N  following  equations  : 

(14)  B  1,2  (fl,-fl)  +  B2,1  (-fl,fl)  =  0 
Y  is  deduced  from  (14)  by  : 

N-l  „ 

(15)  y  =  ^  Z  I^B^  2  )  +  ^2, 1  (“f ’  f >  I 

After  some  computations,  we  obtain  that : 

(16) Bj  2  (f .  -f )  +  B2  1  (-f ,  f )  =  HI'  (f  )H1"  (f )  *  F(s(t),  f ) 

where  F(s(t),f)  only  depends  on  the  fourth-order  moments 
of  the  two  sources.  From  the  expression  (16),  we 
conclude  that  it  only  depend  on  the  coefficients  0(f)  and 
(t)(f)  at  frequency  bin  f.  In  order  to  estimate  them,  it  is 
then  theoretically  equivalent  to  minimize  the  function  y 

(15)  or  to  find  the  solutions  which  cancel  the  equation 
(14)  at  frequency  bin  f.  Call  y(f),  the  contribution  of  the 
cost  function  y  at  the  frequency  bin  f.  The  function  y(f) 
can  be  adaptively  minimized.  After  some  computations, 
we  obtain  three  types  of  solutions  which  cancel  the 
derivative  of  y(f),  relative  to  the  variables  0(f)  or  (|)(f). 
The  first  one  leads  to  (Hr(f)=0)  which  separate  the 
sources.  The  second  one  provides  :  (IHr(f)  I  2  =1/2)  and 
we  easily  verify  that  these  solutions  correspond  to 
maxima  of  y(f)  which  are  not  stable  points.  The  third 

a|Hr(f)  I 

type  of  solutions  which  verify  :  (-^- - =  0)  or 


contains  all  the  previous  points.  The 

same  conclusion  is  obtained  with  the  derivatives  relative 
to  the  variables  (t)(f).  As  a  result,  the  proposed  cost 
function  has  no  local  minima  and  it  assumes  that  the 
proposed  criterion  may  be  adaptively  minimized.  We 
obtain  the  following  adaptation  laws  for  the  estimation  of 
0(f)  and  (t)(f)  at  time  t+1,  0(f,t-i-l)  and  (t)(f,t+l)  : 

N-l 

(17)0(f,t-t-l)  =  0(f,t)-2/x  X  [El3(k,l)B+E31(k,l)C] 
k,l  =  0 

B  =  ns2(t)^ns2(t  - 1)— ”  +  nsl(t -  k)  ^”^^W^2(t-l) 

Crv7  d0 

C  =  nsl(t)2 „sl(t -  +  „s2(t .  k)  a-sKD^^nsKH) 

with  : 

^^^p^=-sin(0(f,t))Xi(f)+cos(0(f,t))exp(-j(j)(f,t))X2(f) 

=cos(0(f,t))exp(j(t)(f,t))Xi(f)-i-sin(0(f,t))X2(f) 

We  obtain  similar  adaptation  laws  for  (t)(f,t-i-l) 

~  '  “  'j  exp(-j(|)(f,t))sin(0(f,t))X2(f) 

with  : 

exp(j(t)(f,t))sin(0(f,t))Xi(f) 

The  unknown  moments  in  (17)  are  adaptively  estimated 
with  the  available  data. 

3.3  Case  of  similar  sources 


In  the  case  of  two  similar  sources  (relative  to  their  fourth- 
order  statistics),  the  dissymmetrical  cumulants  Bi^2  (fE- 
fl).  and  B2,1  (fl,-fl)  are  zero  for  any  estimated  complex 
gains  H'i(f)  and  H"i(f).  We  study  here  the  cancellation  of 
the  two-dimensional  Fourier  Transform  of  the 
symmetrical  cumulants  C22(nsi(t),nsi(t),nsj(t-k),nsj(t-l)), 
funtions  of  two  delays.  The  cancellation  at  frequency  bins 
(fl,-fl)  is  equal  to : 


(17)  S 
f 


|H"  l(f  Dp  |H'  1  (f  )p  +  |H"  2(f  Dp  |H'  2(f  )p 


=  0 


E{|S(f  )p|S(f  Dp}  - 1  -  4|H"  l(f  Dp  |H'l  (f  D| 

From  (17),  we  can  deduce  that  the  separation  is  achieved 


in  several  conditions.  If 


;{|s(f)p|s(fDp}- 


1 


=0  (which 


1_  ^  J 

is  the  case  of  sinusoids,  rotating  machines  noises  [13]), 
the  separation  is  obtained  at  each  frequency  bin  fl. 


If 


E[|S(f)p|S(fDp}-l]<0,  it  can  be  shown  that  the 


separation  is  jointly  obtained  for  each  frequency  bin. 
Consequently,  the  estimated  temporal  sources  are  actually 
independent. 
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4.  Results 

The  simulation  here  after  illustrates  the  behavior  of  the 
proposed  method  in  §3.3,  in  the  case  of  convolutive 
mixtures  of  two  sources.  The  two  processes  are  rotating 
machine  signals.  The  filters  of  the  mixtures  are  MA 
filters  of  order  100.  We  compared  the  method  to  existing 
algorithms  [9]  [10]  which  in  that  case  converge  to  local 
minima.  We  present  the  spectral  densities  of  the 
observations  in  Figl-2,  of  the  right  sources  in  Fig3-4, 
and  of  the  estimated  sources  in  Fig5-6.  We  remark  the 
good  correspondence  between  the  spectra  Fig3-4  and  Fig5- 
6,  which  reveals  the  convergence  of  the  proposed  method 
towards  a  good  solution. 

5  Conclusion 

We  focus  in  this  paper  on  the  generalization  of  the  blind 
source  separation  problem  to  convolutive  mixtures  of 
wide-band  sources.Several  criteria  of  source  separation  are 
studied,  which  are  based  on  the  cancellation  of  different 
fourth-order  cross-cumulants.  For  these  criteria,  we  show 
in  which  conditions  the  separation  is  achieved.  Results  on 
real  data  illustrate  the  proposed  methods  (separation  of 
rotating  machine  noises). 
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Fig.  1-2  Spectral  densities  of  the  observations 


Fig.3-4  Spectral  densities  of  the  sources 


Fig.5-6  Spectral  densities  of  the  estimated  sources 
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Abstract 

Blind  equalization  and  blind  deconvolution  have  been 
an  important  interesting  topic  in  diverse  fields  including 
data  communication,  image  processing  and  geophysical 
data  processing.  Recently,  Inouye  and  Habe  proposed  a 
multistage  maximization  criterion  and  a  single-stage  max¬ 
imization  criterion  for  attaining  the  blind  equalization  of 
multichannel  linear  time-invariant  systems.  However,  their 
maximization  criteria  should  be  subjected  to  several  con¬ 
straints  of  equations.  In  this  paper,  we  present  uncon¬ 
strained  new  maximization  criteria  for  accomplishing  the 
blind  equalization  of  multichannel  linear  time-invariant  sys¬ 
tems.  Stochastic  gradient  algorithms  are  proposed  for  solv¬ 
ing  the  unconstrained  maximization  problems.  Simulation 
examples  are  included  to  examine  the  performance  of  the 
proposed  algorithms. 


1  Introduction 


ing  the  optimization. 

In  this  paper,  we  present  unconstrained  new  maximiza¬ 
tion  criteria  for  accomplishing  the  blind  equalization  of  mul¬ 
tichannel  linear  time-invariant  systems.  Stochastic  gradient 
algorithms  are  proposed  for  solving  the  unconstrained  max¬ 
imization  problems.  Simulation  examples  are  included  to 
examine  the  performance  of  the  proposed  algorithms. 

2  Problem  Formulation 

Let  us  consider  the  system  shown  in  Fig.  1.  It  is  a  cascade 
connection  of  an  unknown  multichannel  system  preceding 
a  multichannel  equalizer. 


Uik) 

y(k) 

z(k) 

mz) 

W(Z) 

Unknown  system  | 

Equalizer 

G(z) 


Blind  equalization  and  blind  deconvolution  have  been 
an  important  interesting  topic  in  diverse  fields  including 
data  communication,  image  processing  and  geophysical  data 
processing  [l]-[3].  Recently,  Shalvi  and  Weinstein  pre¬ 
sented  several  new  criteria  for  blind  equalization  of  single¬ 
channel  linear  time-invariant  systems  [2].  Inouye  and  Habe 
extended  the  Shalvi-Weinstein  approach  to  the  multichannel 
case  [3].  They  proposed  a  multistage  maximization  crite¬ 
rion  and  a  single-stage  maximization  criterion  for  attaining 
the  blind  equalization  of  multichannel  linear  time-invariant 
systems.  However,  their  maximization  criteria  should  be 
subjected  to  several  constraints  of  equations.  In  general, 
unconstrained  optimization  criteria  are  generally  better  than 
constrained  optimization  criteria  for  the  purpose  of  achiev- 


Figure  1.  Unknown  system  and  equalizer 

We  make  the  following  assumptions  on  the  system  and 
the  signals  involved. 

(Al)  The  unknown  system  is  described  by 

oo 

yit)=  Hik)u{t-k)  (1) 

fc=  — OO 

where  y{t)  is  a  real/complex  n-column  output  vector,  u{t) 
is  a  real/complex  n-column  input  vector,  and  {H{k)}  is 
a  real/complex  n  x  n  matrix  sequence  called  the  impulse 
response.  The  system  is  stable,  that  is,  the  impulse  response 
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satisfies  the  absolute  summability  condition 

f;  ||H(fc)||<00  (2) 

jfe=~oo 

(A2)  The  transfer  function  defined  by 

c» 

H{z)-.=  53 

fc=— oo 

is  of  full  rank  on  the  unit  circle  1^1  =  1  (this  implies  it  has 
no  zero  on  the  unit  circle). 

(A3)  The  input  process  {«(*)}  is  a  zero-mean,  non- 
Gaussian  vector  process,  whose  component  processes 
{ui(f)},  i  =  are  mutually  independent.  More¬ 

over,  each  component  process  {ut(f)}  is  an  independently 
and  identically  distributed  (i.i.d.)  process  with  variance 
<7^  7^  0  and  fourth-order  cumulant  7^  0. 

(A4)  The  equalizer  W  (z)  is  described  by 

00 

z{t)=  53  Wik)y{t-k)  (4) 

fc  =  — 00 

where  z{t)  is  a  real/complex  n-column  vector,  called  the 
equalizer  output,  and  {W{k)}  is  a  real/complex  n  x  n 
matrix  sequence.  It  is  assumed  that  the  equalizer  W  is  also 
stable. 

For  the  blind  equalization  of  the  unknown  system,  we 
cannot  observe  the  input,  but  can  observe  only  the  output. 
This  implies  there  are  inherent  ambiguities  in  the  solution 
to  the  multichannel  equalization  problem  as  follows:  In 
general,  we  cannot  identify  the  order  of  the  arrangement  of 
the  components  ui{t)^  •  •  •  of  input  vector  «(f),  the 

time  origin  of  each  component  Ui{t),  and  the  magnitude  of 
each  component  Ui(t). 

Taking  these  ambiguities  into  account,  the  multichannel 
blind  equalization  problem  is  formulated  such  that  it  is  to 
find  an  equalizer  W  so  that  the  transfer  function  G{z)  of 
the  combined  system  takes  the  form  of 

G{z)  =  PA{z)D  (5) 

where  P  is  a  permutation  matrix,  A{z)  is  a  diagonal  matrix 
with  diagonal  entries  Aii(z)  =  z^\i  =  1,  •  *  *  (where  U 
is  an  integer),  and  is  a  constant  diagonal  matrix.  More¬ 
over,  if  we  know  all  the  magnitudes  of  the  variances  of  the 
components  of  the  input  process  ahead,  we  can  constrain  to 
make  the  diagonal  matrix  D  in  (5)  be  equal  to  a  diagonal 
matrix  with  the  diagonal  entries  all  being  unit  magnitude. 

It  is  said  that  a  stationary  random  process  {u{t)}  satisfies 
the  normalized  whitening  condition  if  the  all  the  compo¬ 
nent  processes  =  l,***,n,  of  are  white 

random  processes  with  unit  variance  and  they  are  mutu¬ 
ally  uncorrelated.  When  the  random  process  is  zero-mean. 


this  condition  is  equivalent  to  E{u{t  4*  =  IS{k), 

where  I  denotes  the  identity  matrix  and  S{k)  denotes  the 
Kronecker  delta. 

By  the  multilinearity  property  of  cumulants,  we  can  de¬ 
rive  the  following  formula  for  the  components  of  the  equal¬ 
izer  output  vector  from  (1)  and  (4)  with  (A1)-(A4).  Let 
{G(t)}  be  the  impulse  response  of  the  cascade  system  in 
Fig.  1.  Then  for  any  i\,i2  ^  {L  2,  *  *  •  >w}, 

n  oo 

r=-oo 

For  any  ii,Z2)*3)*4  £  {1)2,  •  •  •  we  have 

C4,Zi ,  ,zf^  ,Zi3  (ti  ,  T2  ,  T3  ) 

=  2  £  9idi'r  +  'ri)9tj{r  +  T2) 
j=\  oo 

9hj  {‘T  +  n)9lj  (7-)K4,«3  •  (7) 

3  Blind  Equalization 

To  begin  with,  let  us  assume  that  the  input  process  {u(t)} 
satisfies  the  normalized  whitening  condition  by  dividing 
each  component  {uj(t)}  by  the  square  root  of  variance  <tI. 
to  eliminate  the  magnitude  ambiguity.  Let  Z  denote  the  set 
of  all  integers. 

3.1  Constrained  Criteria 

In  the  previous  work  [3],  the  following  two  maximization 
criteria,  the  multistage  maximization  criterion  (A)  and  the 
single-stage  maximization  criterion  (B),  were  proposed  and 
analyzed. 

The  multistage  maximization  criterion  (A): 

(Stage  1):  Maximize  |k4,ziI  subject  to  =  1. 

(Stage  k):  Maximize  lK4,zitl  subject  to  =  1  and 
rzi,z-{T)  =0forallT  G  Zand  alii  =  1,2, 

1 .  Here  k  moves  successively  fi'om  2  to  n. 

The  single-stage  maximization  criterion  (B): 

Maximize  X)r=i  l«4,zil  subject  to  rz^z^ir)  =  S(t)  for 
all  i  =  1,  •  ■  • ,  Ti  and  rzi,zr  (t)  =  0  for  all  t  G  Z  and  all 
distinct  i,j=  1,  ■  •  •  ,n. 

Theorem  1 :  Under  the  normalized  whitening  condition  of 
the  input  process  {«(f)},  the  multistage  maximization  cri¬ 
terion  (A)  and  the  single-stage  maximization  criterion(B), 
both  yield  a  solution  to  the  multichannel  blind  equalization. 
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3.2  Unconstrained  Criteria 

It  is  generally  more  difficult  to  solve  a  maximization 
problem  with  constraints  than  to  solve  a  constraint- free  max¬ 
imization  problem  equivalent  to  the  original  one.  In  the 
sequel,  we  develop  constraint-free  criteria  for  solving  the 
multichannel  blind  equalization. 

Let  us  assume  that  we  know  all  the  magnitudes  of  the 
fourth-order  auto-cumulants  of  the  components  of  the  vector 
process  ahead  and  that  they  satisfies  the  following  decreasing 
sequence  condition 

ItiI  >  I72I  >  •■•  >  |7n|  (8) 

where  7i  K4,u;  for  z  =  1,  •  •  • ,  n.  Consider  the  following 
potential  function  [2]  defined  by 

M^i)  ■=  +  hilficrlj  (9) 

where  /(•)  is  a  continuous  real- valued  function  over  [0,  oo) 
such  that 

p{x)  :=  +  f{x)  (10) 

monotonically  increasing  in  0  <  a;  <  1,  monotonically 
decreasing  x  >  \,  and  has  a  unique  maximum  at  a;  =  1. 
Such  a  function,  for  example,  is  given  by  p{x)  =  2aa:  - 
ax^^a  >  0. 

Corresponding  to  the  multistage  maximization  criterion 
(A),  we  consider  the  following  unconstrained  criterion. 

The  unconstrained  multistage  maximization  criterion 
(C): 

(Stage  1):  Maximize 

:=|«V,|  +  |7i|/K)  (11) 

(Stage  k):  Maximize 

Jk  •■=  |k4,^J  +  |7*l/(cri) 

i~\  T^Z 

where  Ao  is  a  positive  constant  greater  than 

|7i|,*-e.,Ao  >  |7i|. 

Based  on  Theorem  1,  we  have  the  following  theorem. 

Theorem  2:  Under  the  normalized  whitening  condition  of 
the  input  process  {w(i)},  the  unconstrained  multistage  max¬ 
imization  criterion  (C)  gives  a  solution  to  the  multichannel 
equalization  problem. 

Corresponding  to  the  single-stage  maximization  criterion 
(B),  we  need  another  assumption  for  the  time  being  that  all 


the  magnitudes  of  the  fourth-order  cumulants  are  identical, 
i.e., 

|7,|  =  |72|  =  ...  =  |7„|  (13) 

Under  this  condition,  we  consider  the  following  uncon¬ 
strained  criterion. 

The  unconstrained  single-stage  maximization  criterion 
(D): 

Maximize 

n 

J  ■=  E{K.J  +  l7*l/«)} 

*=1 

(14) 

k=2  i-=\  rez 

where  Aq  is  a  positive  constant. 

Based  on  Theorem  1,  we  can  obtain  the  following  theo¬ 
rem. 

Theorem  3:  Under  the  normalized  whitening  condition 
of  the  input  process  {«(^)}  and  the  condition  (13),  the  un¬ 
constrained  single-stage  maximization  criterion  (D)  gives  a 
solution  to  the  multichannel  blind  equalization  problem. 

Remark  I:  When  all  the  magnitudes  of  the  fourth- 
order  auto-cumulants  of  the  components  of  the  input  vector 
process  are  not  the  same,  the  criterion  function  (14)  with  Ao 
being  a  small  positive  constant  can  not  be  generally  applied 
for  achieving  the  multichannel  blind  equalization.  In  such 
a  general  case,  it  is  not  clear  at  the  present  how  to  choose 
a  large  number  for  Aq  in  the  criterion  function  (14)  to  solve 
the  multichannel  blind  equalization  problem. 

4  Simulation  Examples 

In  order  to  see  the  effectiveness  of  the  proposed  criteria, 
we  developed  two  stochastic  gradient  algorithms  for  solving 
the  problem  of  the  multistage  maximization  criterion  (A)  and 
the  unconstrained  multistage  maximization  criterion  (C). 
They  are  omitted  for  page  limit.  The  algorithm  for  criterion 
(A)  requires  (multichannel)  spectral  prewhiting  of  the  output 
process  of  the  unknown  system.  We  used  a  finite  impulse 
response  (FIR)  system  to  approximate  the  equalizer. 

We  took  following  system  that  is  a  2-input  and  2-output 
all-pass  system  described  by 


322 


(15) 


We  note  that  H{z)  satisfies  the  all-pass  condition 
=  I  Hence  we  need  not  perform  prewhit¬ 
ing  in  this  case.  The  first  channel  input  signal  «i(f)  was 
16-QAM  with  unit  variance,  and  the  second  channel  input 
signal  U2{t)  was  4-PSK  (phase-shift  keying)  with  unit  vari¬ 
ance.  We  used  a  2-input,  2-output  and  24-tap  equalizer 
W{z).  The  both  algorithms  contain  stochastic  expectation. 
Therefore,  we  used  50  data  points  to  calculate  expectation. 
The  step  size  was  chosen  to  be  0.02.  The  positive  constant 
Ao  in  (12)  was  set  to  be  1 .  As  a  measure  of  performance  we 
used  the  multichannel  intersymbol  Interference  denoted 
by  Misi,  defined  in  [3].  The  initial  Misi  in  the  logarithmic 
(dB)  scale  was  8.041 1  dB. 


i; 

♦ 

-i  ~  tr  V*  .....  1 

(a)  Equalized  output  of  (b)  Equalized  output  of 
channel  1  using  crite-  channel  2  using  crite¬ 
rion  (A)  non  (A) 


*! 

1’ 

- 

(c)  Equalized  output  of 

(d)  Equalized  output  of 

channel  1  using  crite¬ 

channel  2  using  crite¬ 

rion  (C) 

rion  (C) 

Figure  2.  Signal  constellations  after  equaliza¬ 
tion. 


The  both  algorithms  were  tested  in  10  Monte  Carlo  runs 
using  20,000 data  samples  at  each  of  the  two  channel  outputs. 
Fig.  2  shows  the  equalized  signal  constellations  obtained  by 
using  the  constrained  criterion  (A)  and  the  unconstrained 
criterion  (C)  with  a  =  10,  respectively.  Since  the  magni¬ 
tude  of  fourth-order  cumulant  of  the  4-PSK  signal  is  greater 
than  that  of  the  16-QAM  signal,  the  4-PSK  signal  was  recov¬ 
ered  as  the  first  channel  output  zi  (t)  at  Stage  1  and  16-QAM 
as  second  channel  output  ZaCQ  at  Stage  2.  We  see  from  Fig. 
2  that  the  equalized  output  of  channel  1  using  the  uncon¬ 
strained  criterion  (C)  converges  better  than  that  using  the 
constrained  criterion  (A),  though  there  is  no  clear  differ¬ 
ence  between  the  two  equalized  outputs  of  channel  2  using 
the  constrained  criterion  (A)  and  using  the  unconstrained 
criterion  (C). 


Figure  3.  Performances  of  the  algorithms  of 
the  constrained  criterion  (A)  and  the  uncon¬ 
strained  criterion  (C).  The  solid  line  (a)  de¬ 
notes  <Misi>  using  the  constrained  criterion 
(A),  and  the  dashed  line  (b)  denotes  <Misi> 
using  the  unconstrained  criterion  (C)  with 
a=10. 


In  Fig.  3,  we  plotted  the  averaged  Misi,  denoted  by 
<  Misi  >  over  10  Monte  Carlo  runs.  By  comparing  the 
constrained  criterion  (A)  with  the  unconstrained  criterion 
(C)  we  found  through  simulations  that  the  unconstrained 
criterion  (C)  exhibits  better  convergence  behavior  than  the 
constrained  criterion  (A)  except  for  the  case  of  cz  =  1. 
Therefore,  we  had  better  choose  the  value  of  a  greater  than 
1. 

5  Conclusions 

We  have  proposed  the  unconstrained  multistage  maxi¬ 
mization  criterion  and  the  unconstrained  single-stage  max¬ 
imization  criterion.  Simulation  examples  have  shown  to  il¬ 
lustrate  the  performance  of  the  algorithm  of  the  constrained 
multistage  criterion  (A)  and  the  performance  of  the  algo¬ 
rithm  of  the  unconstrained  multistage  criterion  (C).  We  have 
not  yet  developed  two  stochastic  gradient  algorithms  for  the 
problems  of  the  single-stage  maximization  criterion  (B)  and 
the  unconstrained  single-stage  maximization  criterion  (D). 
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Two  unknown  non-white  stochastic  sources  (e.g,  speech 
signals)  are  dynamically  mixed  by  an  unknown  multi- 
path  channel  and  subsequently  measured  by  two  sen¬ 
sors.  The  objective  is  to  construct  an  inverse  filter 
that  separates  the  two  signals,  based  only  on  their  in¬ 
dependence.  It  is  known  that,  under  certain  conditions, 
second-order  statistics  provide  sufficient  information  to 
identify  the  filter.  In  contrast  to  the  usual  cost  func¬ 
tion  optimization  techniques,  we  propose  an  algorithm 
that  computes  the  filter  coefficients  algebraically,  using 
linear  algebra  techniques  such  as  the  singular  value  de¬ 
composition. 

Keywords:  stochastic  signal  separation 


1.  Introduction 

We  consider  the  problem  of  separating  two  mutually 
uncorrelated  non-white  stochastic  sources  jointly  re¬ 
ceived  over  two  unknown  multipath  channels.  A  num¬ 
ber  of  papers  have  been  published  in  this  context,  un¬ 
der  various  assumptions  on  the  signals  or  the  channels, 
and  using  various  techniques;  see  [2-5, 7-9, 12, 13]  and 
the  references  therein.  Techniques  may  broadly  be  clas¬ 
sified  as  (a)  block-methods  based  on  high-order  statis¬ 
tics  (second  and  fourth-order  cumulants),  (6)  adaptive 
methods  based  on  optimization  of  a  blind  cost  func¬ 
tion  (or  nonlinear  contrast  function),  (c)  maximum- 
likelihood  estimation,  presuming  the  source  distribu¬ 
tions  are  known.  In  many  cases,  a  limited  scenario 
with  only  scalar  mixtures  is  considered. 

The  algorithm  proposed  in  this  paper  is  a  block- 
method  based  on  second-order  statistics  of  the  mea¬ 
surement  data  only.  The  parameters  of  the  inverse  fil¬ 
ter  are  to  be  found  such  that  the  resulting  filtered  out¬ 
put  signals  yi{t)  and  y2{t)  have  zero  cross-covariance 
function.  Assuming  a  certain  filter  structure,  the  re¬ 
sulting  conditions  take  the  form  of  bilinear  equations. 


Figure  1.  Separation  scenario 

The  usual  approach  at  this  point  is  to  set  up  a  cost 
function  whose  minimum  coincides  with  the  solution 
of  the  equations,  and  to  apply  a  stochastic  gradient  or 
Newton-type  search  algorithm  to  find  the  minimum  [5]. 
Our  main  point  is  the  observation  that  the  equations 
can  also  be  solved  algebraically,  via  a  singular  value 
decomposition  (SVD).  This  gives  an  exact  solution  to 
the  problem  in  case  the  covariance  data  is  exact.  With 
estimated  covariances,  a  subsequent  step  is  needed,  in 
which  we  have  to  find  a  linear  combination  of  a  col¬ 
lection  of  matrices  such  that  the  result  has  rank  1.  A 
similar  problem  arose  in  the  context  of  separation  of 
constant-modulus  signals  [11]. 

2.  Problem  formulation 

The  data  model  that  we  consider  in  this  paper  is  de¬ 
picted  in  figure  1.  The  source  signals  are  Si(n)  and 
52  (n),  which  are  linearly  filtered  white  noise  processes 
^2(n).  We  make  the  following  assumptions: 

Cl:  ^i(n)  and  axe  realizations  of  mutually  un¬ 
correlated  identically  distributed  sequences  with 
non-zero  variance  and  zero  mean. 

C2:  Si(n)  and  52  (n)  are  generated  by  convolving  ^i(n) 
^2(^)  with  two  different  asymptotically  stable 
rational  filters. 
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The  source  signaJs  sure  measured  via  an  unknown  mul¬ 
tichannel,  with  outputs  xi{n)  and  X2{n).  The  struc¬ 
ture  of  the  channel  is  supposed  to  consist  of  a  single 
direct  path  for  the  transfer  of  si  to  xi  and  su  to  X2, 
and  short  FIR  multipaths  Bi(q~^)  and  B2(q~^)  for  the 
crosscoupling  si  to  X2  and  S2  to  xi-  The  objective  is 
to  retrieve  si(n),  S2(n)  from  Xi(n),  X2(n).  This  can  be 
done  in  a  two  step  procedure  where  step  one  is  a  separa¬ 
tion  and  step  two  a  post-filtering:  (1)  from  Xi,X2,  find 
yi(n)  =  ffsi(n)  and  y2(n)  =  Hs2{n),  where  H{q~^)  is 
some  FIR  filter;  (2)  inverse  filter  the  sequences  Hsi{n) 
and  Hs2{n)  with  H~^{q~^)  to  retrieve  si  and  S2.  Here, 
we  focus  on  the  first  step:  the  actual  signal  separation. 

The  separation  structure  is  a  direct  feedforward  fil¬ 
ter  as  depicted  in  figure  1,  where  Di{q~^)  and  D2{q 
are  adaptive  FIR  filters.  When  Di  =  Bi,  D2  =  B2, 
the  separation  structure  is  equal  to  the  channel  inverse 
times  the  filter  H{q-^)  =  l-Di{q-^)D2{q-^),  in  which 
case  the  filter  outputs  yi,  y2  are  equal  to  Hsi,  Hs2- 
More  generally. 


_  1—52^^!  Bi—Di  Si 

B2  —  D2  1  —  B1D2  _S2_ 

where  6  =  [dio  •  •  •  diu-i  ^20  ■  •  •  d2V-i]^  = 

[df  dj]^  is  the  parameter  vector  of  the  separation 
structure.  To  enable  separation,  the  filter  lengths  U,V 
of  Di  and  D2  should  be  at  least  as  large  as  the  channel 
lengths,  Li  and  L2-  This  is  only  possible  if  the  natural 
assumption 

C3:  Li<U  and  L2<V 

is  introduced.  Condition  C3  assures  that  the  separa¬ 
tion  structure  is  in  the  model  class. 

In  order  to  recover  the  sources  we  require  that 

C4:  H{q~^)  is  minimum  phase. 

This  is  natural  since  has  a  stable  inverse  only 

if  it  is  minimum  phase.  The  condition  C4  is  fulfilled  if 
|5i(e^“)R2(e^")|  <  1  for  all  w  G  [0, 27r],  cf.  [1]. 

3.  An  algebraic  separation  algorithm 

The  proposed  algorithm  is  based  on  finding  the  co¬ 
efficients  6  of  the  separation  filter  such  that  the  fil¬ 
ter  outputs  j/i  and  j/2  are  mutually  uncorrelated.  Let 
(1)  =  E{yi{n)y2{n  -  /))  be  the  cross-correlation 
hetv/een  the  filtered  signals.  We  will  only  force  inde¬ 
pendence  with  respect  to  second  order  statistics,  i.e. 
the  cross-correlation  of  j/i  and  y2  is  equal  to  zero  for  a 
selected  number  of  (2L  +  1)  lags  [7]: 

<y.(0=0,  -L<l<L.  (2) 


The  cross-correlation  is,  under  assumption  Cl 

and  C2,  given  in  terms  of  the  measured  data  xi,  X2  as 

—ExiXlif)  ~  Hl'*'x2l2(0  ~  ^2  ^*1*1  (0  + 

-fdfRx2x,(0d2,  (3) 


where 

rx,x,(0  =[R...r{l)  (4) 

^*2X2(0  =[-Hx2X2(0  ■  •  •  Rx^X^i)-  —  {/■+-  1)]  (5) 

I^X2®1  (0  (0  •••  (6) 

rX2X.  (0  =[Rx,x,  (1)  ■  ■ .  Rx,x,  {1-U+  1)]^  (7) 

Thus,  the  separation  problem  reduces  to  solving  a  sys¬ 
tem  of  bilinear  equations.  In  [5]  it  is  proven  that  there 
are  at  least  as  many  equations  as  unknowns  under  con¬ 
ditions  Cl-3  with  the  exception  of  the  static  channel 
and  white  sources.  By  adding  the  condition  C4,  this 
identification  problem  becomes  parameter  identifiable 
(apart  fi:om  static  channels),  cf.  [6]. 

The  equations  (3)  with  left  hand  side  equal  to  zero 
can  be  solved  iteratively,  in  conjunction  with  a  crite¬ 
rion,  by  means  of  gradient  minimization  techniques, 
cf.  [5].  However,  since  such  techniques  are  usually 
bothered  by  local  minima  and  require  accurate  initial 
points,  it  is  interesting  also  to  consider  an  exact  solu¬ 
tion  of  the  equations,  as  follows. 

The  idea  is  to  rewrite  the  bilinear  equations  (3)  in 
matrix  form,  using  Kronecker  products  to  collect  all 
unknowns  into  a  single  (structured)  parameter  vector. 
This  produces 


Ryw^i  R) 
.  Ryiy2iR)  . 


(8) 


where  ‘(g)’  is  the  Kronecker  product,  and 


P  = 

Vec(Rx2Xl(~L))^  Xx.yxii~R)  ^'x2xS~R'^  RxixJcR) 

_  Vec(Rx2Xl(L))^  ^“xiXliR)  *'x2X2(-^)  RxiXliR)  . 

‘vec’  denotes  the  vectoring  operation  which  stacks  all 
columns  of  a  matrix  into  a  single  column.  Thus,  the 
problem  is  equivalent  to  finding  a  vector  with  a  certain 
structure  in  the  null  space  of  the  data  matrix  P .  This 
null  space  can  be  determined,  or  estimated,  by  a  sin¬ 
gular  value  decomposition  of  P.  Thus  let  a  basis  for 
the  null  space  be  given  by  {vi,  •  •  •  , v^},  where  S  is  the 
dimension  of  the  null  space.  Since  the  precise  basis  is 
arbitrary,  the  problem  is  to  find  a  linear  combination 
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singular  values  of  P 


of  these  vectors  such  that  we  obtain  a  vector  with  the 
required  structure,  i.e.  to  find  Ai ,  •  •  •  ,  such  that 


AiVi  +  ■  •  •  +  A^v^  = 


d2  i8>  di' 

d2 

di 

1 


(9) 


To  make  this  equivalent  problem  more  tractable,  we 
move  from  vectors  to  matrices.  For  a  vector  x  parti¬ 
tioned  as 


X  =[xf  xj  xf  Xif  (10) 

=[a;ii  •  •  •  xi^uv  X21  X2v  2:31  •  •  ■  x^u 


define  the  operator 


sv  index 


mat(x)  :=  (11) 

where  vec(M)  is  a  vectorization  of  the  matrix  M  and 
vec(vec“^(xi))  =  xj.  Note  that 


Denote  Vi  —  mat(vi),  ■  •  ■  ,  =  mat(va).  Equation 

(9)  is  equivalent  to  finding  Ai ,  •  ■  •  ,  A^  such  that 

VA:=AiVi-t-.-  +  A3V4  =  Pi][di’  1], 

L  J  (12) 

Basically,  we  have  to  select  A*’s  such  that  the  resulting 
linear  combination  of  matrices  Va  is  rank  1,  in  which 
case  it  can  always  be  scaled  and  factored  into  the  re¬ 
quired  structure. 

What  is  the  value  of  S?  At  first  sight,  given  enough 
conditions  (lags)  we  would  expect  =  1,  since  the 
solution  to  the  separation  problem  is  usually  unique. 
However,  the  Toeplitz  structure  of  Ri2a:i  adds  ex¬ 
tra  vectors  to  the  null  space  of  P:  certain  columns 
of  P  are  duplicated,  which  reduces  its  rank.  The 
number  of  repeated  entries  in  the  Toeplitz  matrix  is 
UV  —  (U  +  y  —  1)  =  {U  —  \){y  —  1),  so  that  we  expect 
5  —  1  4-  (f/  —  1)(F  —  1).  The  resulting  null  space  basis 
also  has  structure:  e.g.  for  [/  =  3,  F  =  3,  a  possible 
matrix  basis  is  of  the  form 


Figure  2.  Singular  values  of  P 

If  we  simply  remove  the  duplicate  columns  of  P,  then 
the  ‘trivial’  null  space  solutions  (Vi,  •  •  •  ,  V4)  are  sup¬ 
pressed.  Only  one  vector  in  the  null  space  is  left,  corre¬ 
sponding  to  V5  in  (13).  Hence,  estimates  of  di,  d2  are 
immediately  available,  even  without  solving  the  rank-1 
problem. 

The  above  is  true  only  for  perfect  knowledge  of  the 
covariance  lags,  i.e.  for  an  infinite  amount  of  data. 
In  actuality,  the  estimates  of  these  lags  converge  only 
slowly  to  their  true  values,  and  the  null  space  is  not 
well-determined.  For  accuracy  reasons  it  is  usually 
necessary  to  overestimate  the  value  of  d,  and  actually 
search  for  Ai,  •  •  ■  ,  A4  that  produces  Va  in  (12)  that  is 
as  close  to  rank  1  as  possible.  This  is  reminiscent  of 
the  problem  considered  (and  solved)  in  [11],  where  it  is 
shown  how  a  simultaneous  diagonalization  of  (square) 
matrices  Vi,---  ,Vi  provides  good  estimates  of  the 
Afc’s.  The  simulation  results  reported  in  section  4  are 
based  on  a  blunt  application  of  this  diagonalization  al¬ 
gorithm,  followed  by  a  few  steps  of  an  optimization 
routine  to  improve  the  A^’s.  Although  the  results  are 
reasonable,  it  should  be  remarked  that  the  diagonal¬ 
ization  algorithm  is  theoretically  not  well  motivated 
for  this  application,  because  unlike  the  case  in  [11],  we 
now  expect  only  one  solution  [Ai  •  ■  •  Ai],  rather  than 
6  independent  solutions.  This  means  that  the  V/t  need 
not  be  simultaneously  diagonalizable. 

4.  Simulations 

We  investigate  the  performance  of  the  algorithm  by 
simulation.  In  accordance  with  conditions  Cl  and 
C2,  the  source  signals  si(n)  and  S2{n)  are  generated 
by  filtering  two  mutually  uncorrelated  white  Gaus¬ 
sian  noise  sequences  through  two  autoregressive  fil¬ 
ters.  One  filter  has  a  complex  pole  pair  at  radius 
0.8  and  angle  ±7r/4;  the  other  filter  has  a  radius  of 
0.8  and  angle  ±37r/4.  The  channel  in  this  simula- 
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Method 

N 

Mean 

Variance  xlO  ^ 

dio 

dll 

^20 

^21 

dio 

dll 

d20 

d2i 

True/CRB 

500 

2000 

4000 

0.5 

-0.1 

0.7 

0.3 

0.134 

0.034 

0.017 

0.130 

0.033 

0.016 

0.104 

0.026 

0.013 

0.104 

0.026 

0.013 

Algebraic 

500 

2000 

4000 

0.499 

0.502 

0.500 

-0.098 

-0.098 

-0.100 

0.701 

0.697 

0.700 

0.299 

0.297 

0.299 

0.691 

0.162 

0.086 

0.719 

0.157 

0.086 

2.15 

0.480 

0.232 

1.88 

0.481 

0.229 

Recursive 

500 

2000 

4000 

0.500 

0.500 

0.500 

-0.100 

-0.100 

-0.100 

0.697 

0.700 

0.700 

0.302 

0.300 

0.300 

0.718 

0.043 

0.017 

0.582 

0.048 

0.016 

1.21 

0.032 

0.016 

2.57 

0.035 

0.016 

Weinstein 

500 

2000 

4000 

0.748 

0.651 

0.668 

0.074 

0.024 

0.055 

0.748 

0.651 

0.668 

0.419 

0.480 

0.493 

1677 

62.3 

1334 

873 

276 

104 

1677 

623 

1334 

254 

47.5 

101 

Table  1.  Mean  value  and  variance  of  the  estimated  filter  coefficients 


tion  consists  of  two  filters  Bi{q  —  0.5  —  Q.lq  ^ 
and  B2{q~^)  =  0.7  -f  0.3g”^  The  correlation  matrices 

(4)-(7)  are  estimated  from  N  =  500,  2000,  and  4000 
samples  of  xi,  X2.  We  took  L  =  4  lags  into  account, 
which  gives  a  total  of  9  equations  for  4  unknowns.  The 
Cramer-Rao  Bound  (CRB)  for  this  scenario  is  derived 
as  iVVar^  =  [0.067,0.065,0.052,0.052],  cf.  [10]. 

A  total  of  200  independent  runs  were  performed  for 
each  sample  size.  The  estimated  mean  value  and  pa¬ 
rameter  variance  for  the  present  algebraic  algorithm 
are  given  in  table  1,  along  with  two  other  algorithms. 
The  “recursive”  algorithm  is  basically  a  stochastic 
Newton  search  algorithm  based  on  [5],  and  the  “We¬ 
instein”  algorithm  is  the  one  found  in  [13]. 

For  the  algebraic  algorithm,  theoretically  5  =  2,  but 
we  have  used  J  =  3  because  even  for  N  =  4000  there 
is  no  clear  gap  between  the  large  and  small  singular 
values,  as  is  illustrated  in  figure  2.  Even  so,  the  al¬ 
gebraic  algorithm  performed  less  good  than  the  recur¬ 
sive  method  and  did  not  reach  the  CRB.  Given  exact 
(rather  than  estimated)  covariance  data,  P  does  have 
precisely  two  zero  singular  values,  and  the  algorithm 
did  produce  the  exact  solution. 

It  is  known  that  the  “Weinstein”  algorithm  cannot 
separate  the  sources  unless  5io  =  ^205  ^-iid  this  shows 
up  in  the  results.  For  a  scenario  where  610  =  ^20  the 
algorithm  works,  but  yields  estimates  with  a  higher 
variance  than  the  other  two  algorithms.  Note  that  the 
variance  for  “Weinstein”  is  larger  for  N  =  4000  than 
for  N  =  2000.  This  is  due  to  large  deviations  for  some, 
typically  two,  parameter  trajectories. 
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Abstract 

This  paper  considers  some  aspects  of  the  source  sep¬ 
aration  problem.  Unmeasurable  source  signals  are  as¬ 
sumed  to  be  mixed  by  means  of  a  channel  system  result¬ 
ing  in  measurable  output  signals.  These  output  signals 
can  be  used  to  determine  a  separation  structure  in  order 
to  extract  the  sources.  When  solving  the  source  sepa¬ 
ration  problem  the  channel  filter  parameters  have  to  be 
estimated.  This  paper  presents  a  compact  and  com¬ 
putationally  appealing  formula  for  computing  a  lower 
bound  for  the  variance  of  these  parameters,  in  a  gen¬ 
eral  Many  Inputs  Many  Outputs  scenario.  This  lower 
bound  is  the  asymptotic  (assuming  the  number  of  data 
samples  to  be  large)  Cramer-Rao  lower  bound.  The 
CRLB  formula  is  developed  further  for  the  two-input 
two-output  system  and  compared  with  the  results  from 
a  Recursive  Prediction  Error  Method. 


1  Introduction 

The  problem  of  separating  two  signals  that  are 
mixed  through  an  unknown  dynamic  channel  is  consid¬ 
ered.  Both  the  source  generating  filters  and  the  mixing 
channels  are  modeled  as  ARMA-filters.  This  model  is 
realistic  in  applications  such  as  hands-free  and  hand¬ 
held  mobile  telephony  in  the  presence  of  acoustic  in¬ 
terference.  Noise  reduction  in  hearing-aids  is  another 
application. 

A  lower  bound  for  the  covariance  matrix  of  unbi¬ 
ased  parameter  estimates  is  given  by  the  Cramer-Rao 

*This  work  was  financially  supported  by  the  Swedish  Re¬ 
search  Council  for  Engineering  Sciences  (TFR)  and  the  Swedish 
National  Board  for  Industrial  and  Technical  Development 
(NUTEK). 


Lower  Bound  (CRLB)  [3,  8].  The  Prediction  Error 
Method  (PEM)  is,  for  Gaussian  distributed  distur¬ 
bances,  asymptotically  efficient,  cf  [5].  This  means  that 
the  covariance  matrix  of  the  estimated  parameters  is 
asymptotically  equal  to  the  CRLB. 

Source  separation  is  an  intensive  area  of  research  and 
in  the  past  years  many  algorithms  have  been  presented 
[2,  6,  7,  4].  It  is  of  great  interest  to  compute  the  CRLB 
for  the  source  separation  problem,  since  it  provides  a 
bench-mark  to  compare  algorithms. 

2  Problem  formulation 

Figure  1  depicts  the  scenario  under  consideration. 
Unmeasurable  source  signals,  xi  and  X2,  are  mixed  by 
means  of  a  channel  system  resulting  in  measurable  out¬ 
put  signals  yi  and  y^.  These  output  signals  can  be  used 
to  determine  a  separation  structure  in  order  to  extract 
the  sources.  A  two-input  two-output  (TITO)  system  is 
used  to  specify  the  source  separation  problem. 


Figure  1.  The  data  generating  system. 


It  is  assumed  that  the  source  signals,  xi  and  x^,  can 
be  modeled  as  finite  order  ARMA  processes,  and  that 
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the  channel  filters  have  rational  transfer  functions.  The 
resulting  equations  can  be  put  in  a  matrix  form  as 


’  yiit)  ' 

.  y2{t)  _ 

FiAi 


G2B2 

F2 


(1) 


where  Ai,  A2  etc.  are  polynomials  in  the  unit  delay  op¬ 
erator  and  the  signals  t/i,  2/2  etc.  are  functions  of 
the  discrete  time  variable  t  =  1,2, ...  (To  simplify  nota¬ 
tion  the  dependence  on  q~^  and  t  is  omitted  whenever 
appropriate).  Without  any  restriction  Fi,F2,Ai,  and 
A2  are  assumed  to  be  monic  and  minimum  phase,  and 
Gi  and  G2  to  be  minimum  phase  with  nonzero  direct 
terms.  Also,  the  two  driving  sequences  and  ^2  are 
assumed  to  be  zero  mean,  unit  variance,  mutually  un¬ 
correlated  white  Gaussian  noises. 

Denote  =  610  +  bnq^^  -h  ...  H-  bimq 

B2{q~^)  =  b2o  H-  621^'"^  +  -  +  etc.  Let, 

for  k  =  1,2,  (7k  =  Gfc(O),  and  introduce  the  following 
scaled  versions  of  {Gk}  and  {^fc}  :  Gk  ^  Gkl(^k^^k  = 
ak^k  (For  convenience  we  use  the  same  notation  for  the 
scaled  and  un-scaled  {Gk}  polynomials). 

The  system  in  (1)  can  be  rewritten  as  (assuming 

^10^20  ¥^1)  ' 


y(t)- 


yi 

y2 


1 

Mx 

Ai 


A2 

1 


■v^ 

H 


1  620 

bio  1 


|l  +  ^20  6 

bioil  +  ^2  . 


H(g-')e. 


(2) 


The  purpose  of  the  latter  manipulation  is  that  H(0)  = 
I,  which  simplifies  the  following  derivations. 


3  A  compact  expression  of  the  CRLB 
for  MIMO-systems 


In  this  section  a  compact  expression  for  the  CRLB 
is  derived.  It  should  be  noted  that  the  derivation  and 
formulae  are  independent  of  the  number  of  source  sig¬ 
nals  i.e.  the  results  can  be  used  for  a  general  Many 
Inputs  Many  Outputs  (MIMO)  system. 

Lemma  1 

Consider  a  transfer  matrix  H(g“^)  with  y  =  He  and 

A  =  E{ee^}.  Assume  that  H  and  are  stable, 
H(0)  =  I,  and  e{t)  is  white  noise.  Parameterize 
the  model  with  the  vector  6  which  contains  the 
unknown  coefficients  of  H  and  A. 


Denote 


XT  _  .  _dK 

ddk 


and  let  the  number  of  samples  be  N. 

Then,  /or  TV  >  1,  the  CRLB  is  given  by 

CRLB  =  J-\  (3a) 

where  J  is  the  Fisher  information  matrix: 

[J]fc  ,  =  AT  Tr  {O.SAfcA-iAiA-i 


-hReal 


fi/ 


HfcAHrH-*A-iH-^ 


dz 


1 1  (3b) 


and  where  *  denotes  conjugate  transpose. 


Proof  Consider  Whittle’s  formula  [8]  for  the  asymp¬ 
totic  information  matrix 


[•>)«  = 


d9k 


d9i 


where  (j){z)  denotes  the  spectral  density  completely  de¬ 
fined  by  the  unknown  vector  9.  The  spectral  density 
can  be  written  as  (/>(z)  =  HAH*.  Taking  the  derivative 
of  (j){z)  and  A  with  respect  to  an  arbitrary  element  in 
9  and  inserting  into  (4)  yields 


(HfcAH*  +  HAfcH*  +  HAH;^) 
H-*A-iH-i 

dz  1 

(H,AH*  -i-  HA;H*  -f  HAH,*)  —  > 

=  'wj^y^H-^HfcH-iH/ 

+(H-iHfcH-iHO* 

+A-^U-^UkAi  -f  (A-^H-'HfcAO* 
+AfcA  ^AjA 

+A-iH-iHiAfc  +  (A-iH-iH,Ajfc)* 

-^A-iH-iH*AH,*H-* 

dz  1 

+(A-iH-iH;tAHrH-*)*  — |.  (5) 


The  integral  above  can  be  considered  as  the  inverse  z 
transform  evaluated  at  t  =  0.  For  the  causal  terms  the 
initial  value  theorem  can  then  be  used,  i.e.  the  value 
of  the  inverse  z-transform  evaluated  at  t  =  0  is  equal 
to  the  value  at  2“^  =  0.  Since  H(0)  =  I,  the  terms 
1,  2,  3,  4,  6,  7  are  all  zero.  Term  5  is  independent  of 
2  and  can  be  moved  outside  the  integral  and  equation 
(3b)  follows.  D 
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4  Computational  aspects 


The  expression  (3)  is  still  very  tedious  for  computa¬ 
tion.  The  computations  can  be  reduced  using  some  of 
the  symmetry  of  the  source  separation  problem.  In  this 
section  a  TITO  system  will  be  considered.  Parameter¬ 
ize  the  problem  with  the  vector  6  which  contains  the  co¬ 
efficients  of  Bi,Ai,Gi,Fi,ai,B2,A2,G2,F2,  a,nd  a2 
in  this  order. 

Consider  the  model  in  (2).  The  transfer  matrix  H  can 
be  divided  into  three  parts 


H 


CGD“^^ 


1  b2o 
ho  1 

(6) 


Consider  the  second  term  in  (3b).  It  is  easily  verified 
that  HjfeAHf  is  zero  if  Ha;  corresponds  to  an  arbitrary 
derivative  in  (8)  and  H/  to  an  arbitrary  derivative  in 
(9),  and  vice  versa.  This  follows  from  the  property  that 
ui  and  U2  are  orthogonal  vectors.  Thus,  the  second 
term  in  (3b)  contributes  with  a  block  diagonal  matrix 
to  J.  The  first  term  of  (3b)  is  zero  except  when  9^  and 
6i  equals  6ioj620j<^i  or  <72,  due  to  the  fact  that  A*  is 
zero  otherwise. 

In  order  to  visualize  the  compact  form  of  J,  denote 
the  first  and  the  second  term  in  (3b)  with  Lk^i  and 
Sk,i  respectively.  The  Fisher  Information  matrix  then 
becomes 


and  the  covariance  matrix  of  e  in  (2)  can  in  a  similar 
way  be  written  as 

A  ^  p„„T  _\  +  blo(^2  biocrf  +  620<t|  ' 

[  hocrf  +  hodl 

1  620  CTj  0  1  620 

610  10  610  1 

=  DAD^.  (7) 

Computing  the  derivatives  of  H  w.r.t.  the  ele¬ 
ments  of  6  and  denoting  u*  a  column  vector  with 
unity  in  position  k  and  zeros  elsewhere,  yields 


SH 

daik 

-  CxD 

(8a) 

an 

dhik 

=  -^U2ufz-*GD“^ 

Ai 

(8b) 

9H 

dhk 

(8c) 

an 

dgik 

Fl 

(8d) 

an 

d(7\ 

=  0 

(8e) 

an 

da2k 

(9a) 

an 

db2k 

=  -^uiu|’z-*GD-^ 

A2 

(9b) 

dH 

df2k 

(9c) 

an 

9g2k 

=  ^Cu2u|’z“*D-i 

(9d) 

an 

dal 

=  0. 

(9e) 

where  Aj,  A2,  A3  and  A4  have  nonzero  values  only  in 
the  corners,  whereas  Si  and  S2  vanish  at  the  last  row 
and  column: 

■^1,1  0  •••  0  Li,m 

0  0 

Ai  =  :  :  (11) 

0  0 

f^m,l  0  •  •  •  0 

■S’!,!  S'i,m  0 

Si  =  ■  :  :  .  (12) 

'^m— 1,1  ■  *  *  ^m— 1,771— 1  0 

0  00 


5  Simulations 

Using  formula  (3)  and  the  structure  discussed  in  sec¬ 
tion  4  the  CRLB  was  computed  for  a  TITO  system  with 
FIR-channels  and  AR(2)  source  generating  filters.  The 
numerical  values  for  the  CRLB  are  presented  in  Ta¬ 
ble  1. 


value 

CRLB 

IBII 

value 

CRLB 

^10 

0.6 

0.0626/Ar 

im 

0.5 

0.0628/iV 

BIB 

-0.2 

0.0620/iV 

612 

0.1 

0.0622/iV 

fn 

0.518/iV 

/21 

1.1 

fl2 

0.7 

0.514/A^ 

/22 

0.6 

^1 

1 

2.13/N 

(^2 

1 

!  2.18/N 

Table  1.  CRLB  values  for  FIR-channel  and 
AR(2)  source  filters 
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Simulations  with  a  Recursive  Prediction  Error 
Method  (RPEM)  applied  to  the  source  separation 
problem  are  presented  in  figure  2.  For  a  presentation 
and  analysis  of  the  algorithm,  see  [1]. 

From  figure  2  it  is  seen  that  the  variances  of  the 


Variance  of  estimated  channel  parameters 


Figure  2.  Asymptotic  CRLB  velue  (dashed) 
and  parameter  variance  from  an  RPEM  esti¬ 
mate  (solid) 


parameters  approach  the  CRLB.  The  fact  that  the 
variances  of  the  source  filter  parameters  deviate  from 
the  theoretical  CRLB  can  be  explained  by  the  parsi¬ 
mony  principle.  This  is  due  to  the  fact  that  the  model 
used  in  [1]  is  overparameterized  in  the  estimation  of 
the  noise  covariance  matrix  (7).  The  estimation  of 
this  matrix  includes  estimation  of  three  parameters, 
whereas  it  only  depends  on  two  independent  variables 
namly  cri  and  CT2  in  addition  to  the  channel  parameters 
610  and  620- 
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6  Conclusions 


In  this  paper  a  formula  for  the  CRLB  for  the  source 
separation  problem  is  derived.  After  a  reformulation  of 
the  problem  the  CRLB  formula  was  found  to  be  com¬ 
pact  and  computationally  appealing.  Values  of  this 
bound  are  computed  for  a  simple  test  scenario  and  com¬ 
pared  with  simulations  of  a  recursive  prediction  error 
method. 
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Abstract 

Deterministic  multichannel  blind  deconvolution  is  an  im¬ 
portant  problem  arising  in  numerous  areas  of  engineering. 
Recently,  two  different  approaches  to  solving  this  problem, 
Maximum  Likelihood  techniques  (like  IQML)  and  subspace 
techniques  (like  EVAM),  have  been  proposed.  These  meth¬ 
ods  are  theoretically  elegant  and  computationally  efficient, 
and  questions  arise  as  to  what  the  properties  of  these  esti¬ 
mators  are.  We  attempt  to  answer  some  of  these  questions 
in  this  paper  We  show  that  the  subspace  based  EVAM  es¬ 
timator  is  a  coarse  approximation  of  the  IQML  estimator. 
We  present  a  new  iterative  scheme  to  compute  the  M-L  es¬ 
timator,  and  Cramer-Rao  bounds  for  the  channel  and  input 
estimates.  In  addition,  we  present  a  Monte-Carlo  compari¬ 
son  study  of  the  two  estimators  and  establish  the  superiority 
of  ML  based  techniques. 


1.  Introduction 


Formally,  the  multichannel  blind  deconvolution  problem 
can  be  posed  in  the  following  manner.  Given  yi, . . .  ,yp 
where 

a^Xi  =  yi,  1<^<P  (1) 

and  .x'.j(0)  =  1,  1  <  z  <  p,  it  is  required  to  recover  the 
X,-  C  and  a  G  . 

This  problem  arises  in  various  areas  of  engineering.  For 
example,  in  seismic  signal  processing,  it  is  required  to  re¬ 
cover  the  seismic  trace  from  its  convolutions  with  different 
(unknown)  acoustic  inputs.  The  problems  of  blind  channel 
identification  and  equalization  in  communication  systems, 
and  image  restoration  when  different  blurred  versions  of  the 
same  image  are  available  can  also  be  posed  in  this  fashion. 


*This  work  was  supported  in  part  by  the  National  Science  Foundation 
grant  No:  MIP  91-57377,  and  a  Schliimberger-Doll  research  grant 

f  Yoram  Bresler  is  on  sabbatical  leave  at  the  Technion,  Israel  Institute 
of  Technology  during  1995-96 


The  single  channel  blind  deconvolution  problem  has 
multiple  solutions  and  prior  knowledge  is  required  for  its 
solution.  In  contrast,  the  multichannel  blind  deconvolution 
problem  in  Equation  (1)  has  a  unique  solution  if  the  chan¬ 
nels  Xi  are  FIR,  and  their  ^-transforms  have  no  common 
zeros  [1,4]. 

In  this  paper,  we  address  the  deterministic  problem, 
where  the  input  a  and  the  channels  are  Xj  treated  as 
unknown,  constant  vectors.  Recently,  two  different  ap¬ 
proaches  to  solving  this  problem,  Maximum  Likelihood 
techniques  (like  IQML)  and  subspace  techniques  (like 
EVAM),  have  been  proposed.  These  methods  are  theoret¬ 
ically  elegant  and  computationally  efficient.  However,  to 
date  their  performance  limitations  in  the  presence  of  noise 
have  not  been  satisfactorily  explored.  In  this  paper,  we  at¬ 
tempt  to  do  this.  For  the  case  of  additive  white  Gaussian 
noise,  we 

1 .  Present  an  efficient  algorithm  for  the  computation  of 
the  Maximum  Likelihood  estimate  (MLE). 

2.  Show  that  the  EVAM  estimate  is  an  approximation  to 
the  result  obtained  after  the  first  iteration  of  an  IQML- 
based  strategy  to  compute  the  M-L  estimate. 

3.  Present  an  asymptotic  analysis  of  EVAM. 

4.  Present  Cramer-Rao  bounds  for  the  channel  and  input 
estimates. 

5.  Present  the  results  of  Monte-Carlo  comparison  stud¬ 
ies  of  the  performance  of  EVAM  and  the  MLE,  and 
demonstrate  that  the  MLE  is  the  superior  estimator. 

From  this  point  on,  we  consider  the  case  of  p  =  2.  (Note 
that  both  the  techniques  under  study  here  can  be  easily  ex¬ 
tended  address  the  case  of  more  than  2  channels.  ) 

2  The  Maximum  Likelihood  Estimator 

Consider  the  noisy  version  of  the  multichannel  blind  de- 
convolution  problem,  defined  by 

Yi  =  a*Xi  -t-  crr/i  i  e  {1,2}  (2) 
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where  the  r/j  are  noise  vectors  whose  components  1) 
and  uncorrelated.  In  this  case,  the  ML  estimator  is  given  by 


(a,xi,X2)ML  =  argmin^||a*Xi  -  yi||^.  (3) 

i— 1 

Consider  the  case  when  the  Xj  have  equal  length  N,  and 
their  z-transforms  have  no  common  zeros.  (When  Uxi  # 
11, the  shorter  one  can  be  assumed  to  be  zero-padded). 
Let  yi  and  y2  be  according  to  Equation  (2),  and  let  y  = 
[yf  y?]^-  ^  ^  integer  m,  we  define  the 

{n,c+rn  -lx  rn)  Toeplitz  “convolution  matrix”  Cm(x)  as 


Slock  et  al  [10]  have  shown  that  the  MLE  of  the  channels 
can  be  written  as 

ixi,X2) MLB  =  argmaxy^  (A^A)  ^  A^j  y;  (5) 

where  A  =  [[C„„(xi)]^  |  [C„„(x2)]^]  .  (6) 

They  have  further  shown  that  this  can  be  re-written  as 
{xi,x-2)mle  =  argmin  J(xi,X2)  where 

J(xi,X2)  =  y^[B^(BB^)"'B]y;  (7) 

B  =  [[Cn«+iV-l(“X2)]  I  [Cn„+JV-l(Xl)]]  (8) 

The  latter  step  follows  from  the  “minimal  null-space  pa¬ 
rameterization”  concept  suggested  by  Slock  [10],  who  have 
proposed  an  IQML-type  [2]  iterative  strategy  to  solve  this 
problem.  We  propose  another  scheme  based  on  gradient 
minimization  of  the  cost  function  J().  Following  a  devel¬ 
opment  similar  to  the  one  in  [9],  it  can  be  shown  that  the 
gradient  VjO  of  JQ  can  be  written  as  Q(x)x,  where  Q  is 
an  appropriate  matrix.  A  stationary  point  of  J{x)  satisfies 
Q(x)x  =  Ox.This  nonlinear  eigenvalue  problem  is  solved 
using  the  following  iterative  algorithm. 

1 .  Choose  a  starting  point  xq. 

2.  For  each  xjt,  construct  Q{xfc). 

3.  Choose  Xfc+i  to  be  the  eigenvector  corresponding 
to  the  smallest  absolute  eigenvalue  of  Q(xa;). 

4.  Repeat  until  convergence. 

The  computation  can  be  performed  efficiently,  by  using 
the  inverse  power  method  to  find  the  smallest  eigenvalue 
and  corresponding  eigenvector  of  Q(x),  and  exploiting  the 
structure  of  Q.  Simulations  show  the  algorithm  to  converge 
rapidly  to  the  correct  solution  for  moderate  and  high  signal 
to  noise  ratios  (SNRs). 


Simulations  indicate  that  in  terms  of  convergence  char¬ 
acteristics  and  breakdown  thresholds,  this  algorithm  is  very 
similar  to  the  IQML-based  strategy.  Osborne  et  al  [9]  show 
that,  when  B  is  a  single  Toeplitz  matrix  matrix  with  the 
structure  Cm,  the  eigenvector-iteration  algorithm  has  linear 
convergence.  It  is  conjectured  that  a  similar  result  holds  for 
our  application. 

3  Subspace  techniques 

Consider  Equation  (1)  for  the  case  when  p  =  2.  If 
Yi,Y2,Gi  and G2  arethe^-transformsofthe sequencesyi, 
y2,  gi  and  g2,  respectively  and  length(gi)  =  ]ength(g2)  = 

max(na;i,nx2)  ^ solutions  {Gi{z),G2{z))  to  the 
equation 

Yi{z)Gi{z)  +  Y2{z)G2{z)  -  0  V  z  G  C  (9) 

have  the  form  Gi  (z)  —  aX2{z)  andG2(-2:)  =  — aXi(z)  for 
some  a  E  C  (ref  [4]).  This  equation  (and  variants  thereof 
for  p  >  2)  forms  the  basis  for  the  efficient  and  elegant  sub¬ 
space  techniques  that  seem  to  have  been  developed  simul¬ 
taneously  and  independently  by  Liu  et  al  [8]  and  Gurelli 
et  al  [4].  Gurelli  et  al  call  this  technique  EVAM,  and  we 
shall  continue  to  do  so  here.  Although  EVAM  can  be  used 
successively  to  solve  for  xi  and  X2  even  if  only  upper  lim¬ 
its  for  their  lengths  and  are  available  [4],  we  limit 
ourselves  to  the  case  when  these  lengths  are  known  exactly. 
The  EVAM  estimator  can  be  used  to  recover  the  channels 
up  to  a  scale  factor,  as 

(xi,  X2)f;  =  argmin [x^  -xf]Ry[x2  -  xf ]^;  (10) 

Xi 

=  [C^(yi)  1  C?;(y2)]  [C?;(yi)  |  C^^Cyz)]"^ 

(II) 

and  C  is  the  convolution  matrix  defined  earlier.  (The  appro¬ 
priate  scale  factor  has  to  come  from  prior  knowledge,  say 
||xi||  =  1  or  a:i(0)  =  1.)  Equation  (1 1)  can  be  re-written  as 

(xi,  X2)£;  =  argrniny^  [B^B]  y  (12) 

where  B  is  defined  as  in  Equation  (8).  Thus  it  can  be  seen 
that  the  subspace  technique  is  an  approximation  of  the  first 
iterate  of  the  IQML  technique  (with  BB^  in  Equation  (7) 
replaced  by  the  identity). 

The  EVAM  estimate  is  exact  when  there  is  no  noise  and 
a  is  “persistently  exciting”  [4],  a  condition  that  is  satisfied 
with  probability  one  when  the  elements  of  a  are  drawn  from 
a  continuous  probability  distribution.  However,  when  there 
is  noise  present,  the  matrix  Ry  formed  by  the  EVAM  pro¬ 
cedure  is  a  perturbed  version  of  Ry.  The  eigenvalues  of 
the  matrix  are  also  perturbed,  and  the  perturbation  is  not  in¬ 
dependent  of  a.  The  application  of  EVAM  to  noisy  data  is 
justified  in  [4]  with  the  following  argument:  the  perturba¬ 
tion  Ry  -  Ry  is  a  random  matrix  with  mean  <7*^1  and  the 
variance  of  each  element  of  the  order  of  When 

tends  to  infinity,  it  can  be  seen  that  the  Ry  ~  Ry  -hcr^I,  the 
eigenvectors  of  which  are  the  same  as  those  of  Ry.  So  the¬ 
oretically  EVAM  can  work  even  for  very  large  noise  levels 
in  the  presence  of  sufficiently  large  data  lengths. 
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In  this  paper,  we  present  a  theorem  that  provides  further 
evidence  of  the  reliability  of  EVAM  at  large  data  lengths. 
Using  tools  found  in  [3,  7],  we  have  been  able  to  show  [5] 
the  following. 

Theorem  1.  Let  x%  be  the  EVAM  estimate  of  the  channel 
Xi  obtained  for  the  noisy  multichannel  blind  deconvolution 
problem  presented  in  Equation  (2).  Then  xf^  is  "asymptoti¬ 
cally  unbiased  to  second  order".  This  means  the  following. 
Let  <T  1  and  iia  ^  +  tix^-  Then  we  can  write  as 

OO 

= Xi + (13) 

3  =  1 

where  the  Uj  do  not  depend  on  a  or  Ua-  (Refer  [11  ]).  Let 
En\\  denote  expectation  with  respect  to  different  noise  real- 
izations.  Then  En[ti^i\  0,  En[ti;2]  =  Ofori  e  {1,2}. 

ThusEn[^E\  -  Xi  + 


0  50  ioo  150  200  0  50  100  150  200 


(a):  channels  (b):  input 

Figure  1.  Plots  of  e^()  (solid  lines),  eMLE{) 
(dashed  lines)  and  ^()  (dotted  lines)  as  a  function 
of  the  input  length 


5  Monte  Carlo  Comparison  Studies. 


4  The  Cramer-Rao  bound 


Given  a  family  of  distributions  for  a  random  vari¬ 
able  Z  G  Z,  indexed  by  a  parameter  0  E  0,  the  Cramer 
Rao  bound  is  a  lower  bound  on  the  covariance  of  any  unbi¬ 
ased  estimator  e{z)  of  0.  It  is  not  always  achievable,  but 
is  a  commonly  used  benchmark  against  which  the  mean 
square  errors  of  proposed  estimators  are  compared.  Let 
9m  equal  to  the  convolution  matrix  C.m  without  its 
first  column.  With  these  definitions,  and  defining  0  — 
[.7;i(l),,..  ~  1),  3:2(1),. . .  ,3;2(n^2  -  l),a^]^,  it 

can  be  shown  [5]  that  the  Cramer-Rao  bound  for  6  in  the 
scenario  in  Equation  (2)  is  given  by  k{6),  where 


A  rir 


n{6)  = 


A 


D 


-1 


(14) 


CL,(a)]^[C;^^(a)]  O 

o  [c;,^  fa)]^[c;^  (a)] 


(15) 


[CL,(a)]^[C„„(xi)] 

[CL,(a)]^[C„„(x2)] 


(16) 


D  =  ^[C„„(xi)]^[C„„(xi)]  (17) 

i=l 

Note  that  a^A~^  is  the  CRB  for  any  unbiased  estimator 
of  the  channels  x^  when  the  input  a  is  known,  and 
is  the  CRB  for  any  unbiased  estimator  of  a  when  the  chan¬ 
nels  are  known  (non-blind  case).  Equation  (14)  can  be 
further  simplified  (using  standard  results  on  matrix  inver¬ 
sion  [6]).  The  part  of  the  CRB  corresponding  to  the  chan¬ 
nels,  k(xi  ,  X2)  can  be  shown  to  be 

k(xi,X2)  =a^  [A"^  +  A"^B(P  -  B^A“^B)“^B^A~^]  . 

(18) 

It  can  be  seen  that  the  Cramer-Rao  bounds  for  the  blind  case 
are  greater  than  those  for  the  non-blind  case,  as  expected. 


In  the  studies  described  in  this  section,  we  attempt  to 
evaluate  the  performances  of  the  EVAM  and  ML  estimators 
for  the  2-channel  blind  deconvolution  scenario  described 
by  Equation  (2),  by  comparing  them  to  each  other  and  to 
the  Cramer  Rao  bound,  under  different  conditions,  In  what 
follows,  xe  and  xmle  ^re  the  EVAM  and  ML  estimators 
for  the  channels,  and  and  ^mle  are  similar  estimators 
for  the  input.  es(x)  ^  En[\\xE  -  xf]/{nx,  +  and 

ejE;(a)  =  £^n[||a£; -a|p]/(na),  where  the  expectation  £!„  is 
over  different  realizations  of  the  noise  vectors  T^^ .  e mle  (x) 
and  CMLEi^)  are  similarly  defined.  Also,  we  define  the 
Cramer-Rao  bounds  c(x)  ^  trace(/t(x))/(nj,,  4-  n^,)  and 
<r(a)  ^  trace(K(a))/(na). 

In  the  first  study  described  here,  we  demonstrate  the  ef¬ 
fect  of  the  input  length  on  the  2  estimators.  Here  is  a 
description  of  the  study. 

•  Choose  a. 

•  Choose  random  xi  E  ,  X2  E  . 

•  Choose  u  E  E^^° . 

•  For  =  10  :  10  :  200a  =  u(l  :  7iu), 

•  Compute  eB(x),ei=;(a),eML£:(x),eML£;(a),c(x),<:(a) 

for  each  71^. 


The  results  of  one  typical  run  (for  a  specific  realization 
of  the  Xi  and  u)  is  in  Figure  1 .  In  this  case  71^^  —  =  5 

and  a  =  0.001,  corresponding  to  an  SNR  of  60  decibels. 
Note  that  the  cf  was  chosen  to  be  rather  small  on  purpose. 
In  accordance  to  our  analysis,  it  ensures  that  the  EVAM  es¬ 
timates  are  essentially  unbiased,  thus  justifying  comparison 
with  the  CRB  for  unbiased  estimation. 

For  all  the  plots  in  this  chapter,  the  y-ZLxis  is  calibrated  on 
a  logarithmic  scale.  Though  ML  has  performed  uniformly 
better  than  EVAM  at  all  input  lengths,  the  difference  in  per¬ 
formance  becomes  small  for  large  Ua. 

The  next  study  compares  the  performance  of  the  EVAM 
and  ML  estimators  at  different  signal  to  noise  ratios  (SNRs). 
The  methodology  is  the  same  as  in  the  previous  case,  except 
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that  the  quantities  e  and  c  are  computed  at  different  values 
of  (T.  The  result  of  a  typical  run  (for  a  specific  choice  of  the 
Xi  and  u)  is  shown  in  Figures  2  (a)  through  (d). 

It  can  be  seen  that  ML  outperforms  EVAM  by  a  substan¬ 
tial  margin  at  all  SNRs.  However,  the  iterative  algorithms 
for  ML  do  not  converge  for  low  SNRs.  So  the  ML  esti¬ 
mates  used  in  the  cmle  piots  in  Figures  2(c)  and  (d)  were 
computed  using  a  descent  algorithm.  Note  that  the  EVAM 
technique  breaks  down  much  sooner  than  ML.  By  SNR  = 
10  dB,  both  estimators  have  broken  down. 
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(broken  lines).  Thex-axis  is  numbered  in  multiples 

of  max(na;i,na;2)- 


The  last  study  compares  the  errors  in  the  M-L  estimates 
to  those  in  the  EVAM  estimates  and  to  the  Cramer  Rao 
Bound  by  Monte-Carlo  studies  that  sample  various  realiza¬ 
tions  of  the  channels  and  the  input.  The  numbers  Ua  =  12, 
5,  =  5  and  <7  =  0.01  (corresponding  to  SNR 

=  40  dB)  are  fixed,  and  the  vectors  a,  xi  and  X2  are  gen¬ 
erated  randomly  (according  to  some  distribution)  in  , 
and  E"^*2 ,  respectively  (100  times  for  this  study).  c(x) 
and  ^(a)  are  computed  for  each  (x^,  a).  e^O  and  cmle  are 
also  computed  by  an  internal  Monte-Carlo  run,  with  ran¬ 
dom  noise  realizations.  Figure  3(a)  contains  a  histogram  of 
the  quantity  e^(x)  / eML^(x)-  It  can  be  seen  that  the  ratios 
are  often  very  large,  indicating  that  the  EVAM  estimates 
are  far  worse  than  the  M-L  estimates.  Figure  3(b)  contains 
a  similar  histogram  of  the  ratio  eML£(x)/‘r(x).  The  ratio 
is  nearly  always  one,  showing  essentially  efficient  perfor¬ 
mance  of  the  M-L  criterion  with  finite  data  at  this  SNR. 

Recall  that  for  the  value  of  a  chosen,  the  inferior  perfor¬ 
mance  of  EVAM  is  not  due  to  threshold  behavior. 


Figure  3.  (a):  Histogram  of  the  ratios 

e£;(x)/eML£;(x)  (b):  Histogram  of  the  ratio 
cml£?(x)/c{x).  In  each  case,  the  last  bin  has 
everything  greater  than  1 00. 
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ABSTRACT 

The  problem  of  multichannel  blind  signal  deconvolution  is 
considered.  We  show  that  input  signals  can  be  restored 
(or  separated)  using  only  the  condition  that  they  are  sta¬ 
tistically  independent.  Two  main  necessary  and  sufficient 
conditions  involving  high  order  cumulants  are  given  and 
proved.  Hence,  a  class  of  criteria  for  multichannel  signal 
deconvolution  are  obtained.  Self  adaptive  gradient  based 
algorithms  are  derived  in  order  to  optimize  the  proposed 
criteria  and  computer  simulations  are  presented  in  order  to 
demonstrate  that  the  proposed  algorithm  works. 


is  a  sequence  of  {N ,N)  matrices  which  describes  the  impulse 
response  of  the  LTI  mixing  filter. 

The  multichannel  blind  deconvolution  problem  consists 
in  estimating  a  LTI  filter  (equalizer)  {H(.)}  thanks  to  the 
only  observations  x(t)  of  an  unknown  LTI  system  {G{k)} 
and  such  that  the  vector 

=  (2) 

restores  the  N  input  signals  a,.  We  define  the  global  LTI 
filter  {S(.)}  according  to 


1.  INTRODUCTION 


y(t)  =  53s(fc)a(t-fc)  .  (3) 


The  problem  of  multichannel  blind  signal  deconvolution  (or 
blind  equalization)  of  Linear  Time  Invariant  (LTI)  systems 
is  currently  receiving  a  lot  of  attention,  see  [l]-[8]  and  ref¬ 
erences  therein.  The  problem  finds  numerous  applications 
in  diverse  fields  of  engineering  and  applied  sciences,  e.g. 
data  communication,  sonar  processing,  seismic  exploration, 
antenna  processing,  speech  processing. 

In  the  past  ten  years  most  of  the  proposed  approaches 
consider  a  restrictive  model  known  as  source  separation  [9]- 
[12].  Indeed  in  that  case  the  coupling  channels  are  assumed 
(unknown)  constant  gains.  Here  we  consider  the  more  gen¬ 
eral  model  in  which  the  coupHng  channels  are  unknown  LTI 
systems.  It  can  be  simply  formulated  as  follows.  Several  lin¬ 
ear  (temporal  and  spatial)  mixtures  of  certain  independent 
signals  called  sources  are  observed.  We  want  to  recover  the 
unknown  original  sources  without  knowing  the  mixing  fil¬ 
ter.  Hence,  this  must  be  realized  from  the  only  knowledge 
of  the  observations.  This  is  the  reason  why  this  kind  of  ap¬ 
proach  is  often  qualified  as  “blind”  or  “unsupervised”.  In 
this  paper  the  case  of  complex  signals  is  considered. 

2.  PROBLEM  FORMULATION 

We  consider  the  multichannel  LTI  and  generally  non-causal 
system  described  by 

x{t)  =  J^G(k)a{t-k)  (1) 

where  a(t)  is  the  {N,l)  vector  of  statistically  independent 
sources,  x(t)  is  the  (Nyl)  vector  of  observations  and  {G(.)} 


It  is  necessary  to  make  the  two  following  assumptions. 

Al  Each  source  is  a  sequence  of  zero-mean  complex  in¬ 
dependent  and  identically  distributed  (i.i.d.)  continuous  or 
discrete  random  variables.  Without  any  loss  of  generality 
they  are  assumed  imit  power.  Moreover  we  shall  assume 
that  non-zero  cumulants  of  random  variables  exist  and  are 
finite  whenever  they  are  introduced.  In  particular,  this  im¬ 
plies  that  sources  must  be  non-Gaussian.  Finally  we  assume 
that  the  p-th  order  joint  cumulant  of  the  real  and  imaginary 
parts  of  each  source  are  equal. 

A2  The  unknown  LTI  system  {G(.)}  is  assiuned  stable  and 
invertible. 

Notice  that  assumption  Al  is  not  very  restrictive  e.g.  in 
digital  communication  since  most  signals  have  a  symmetric 
constellation,  e.g.  4-QAM,  16-QAM,  V27. 

Because  sources  are  assumed  inobservable,  there  are 
soine  inherent  indeterminations  in  their  restitution.  That 
is,  in  general,  we  cannot  identify  the  order,  the  power  and 
the  time  origin  of  each  sources.  Indeed  this  combines  the 
inherent  indeterminations  of  the  source  separation  problem 
together  with  those  of  the  classical  blind  scalar  deconvolu¬ 
tion  problem.  Hence  signals  are  said  separated  if  and  only 
if  (iff)  the  global  LTI  system  {S(.)}  reads 

s  W  =  5^  S(k)z-'‘  =  D(z)DiP  (4) 

k 

where  I3(z)  is  a  diagonal  matrix  such  that  its  entries  are 
dii(z)  =  z  i  m  integers,  Di  an  invertible 

constant  diagonal  matrix  and  P  a  permutation  matrix. 
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3.  DECONVOLUTION  CRITERIA 

Contrast  functions  as  defined  in  [2]  constitute  blind  decon¬ 
volution  criteria  in  the  sense  that  they  are  maximum  iff 
the  relation  in  (4)  holds  for  8(2:).  In  the  following  “white” 
vectors  y  are  considered,  i.e.  vectors  such  that 

E[y(*)y^(*  “  ■^)1  = 

where  I  is  the  {N,N)  identity  matrix,  6{.)  the  dirac  distribu¬ 
tion  and  E  the  mathematical  expectation  operator.  White 
vectors  y  are  deduced  from  sources  a  thanks  to  (3)  if 

SWS"(p)  =  I  .  (6) 

Let  us  define  the  two  functions  and  Ip  according  to 

i?(y)  =  E  =  E 

i=l 

where  CpU  is  the  p-th  order  joint  cumulant  of  real  random 
variable  u,  p  an  integer  greater  or  equal  to  3  and  Tl{yi) 
(resp.  I{yi))  stands  for  the  real  (resp.  imaginary)  part  of 
complex  random  variable  yi.  The  following  theorems  are 
proved  in  the  paper. 

Theorem  1  The  function  I^(.)  (resp,  lp{-))  for  p  >  3  is 
a  contrast  over  the  set  of  white  random  vectors  haxjing  at 
most  one  null  cumulant  of  order  p  of  its  real  part  (resp. 
imaginary  part)  . 

Proof;  We  only  consider  I^(.)  because  the  proof  for  Ip(.) 
is  completly  similar.  Clearly,  I?(.)  is  symmetrical  and  in¬ 
variant  under  scale  change.  Let  us  show  that  if  {S(.)}  is 
such  that  (6)  holds  then 

iJ^(Sa)  <  I?(a)  .  (8) 

FVom  (3)  one  has 

Thus  thanks  to  the  independence  of  the  sources 

Cp7^(pi)  =  y^7^^(sij(fc))Cp'R-(oj)-h(— l)^T^(gtj(fc))CpJ(aj) 

(10) 

Since  Cp'R.{aj)  =  CpKflj)  one  has 

N  N 

Y,  |Cp7^(yi)l  <  E  (11) 

t=l  j=l 

where 

Aj  =  EldT^C^y  W)!*’  +  |2^(«o(*=))l’’)  •  (12) 

i,fc 

Now  from  (6),  Vi,  =  1>  Aj  <land 

(8)  is  realized. 


Let  us  consider  the  equality  in  (8).  If  one  source,  say 
ONi  is  such  that  CpK(aN)  =  0  then  equality  in  (8)  requieres 
equality  Aj  =  1  for  j  =  1, . . . ,  iVi  which  holds  if  it  exists 
one  and  only  one  i  =  1,...,^;  j  =  l,...,iVi  and 

VA;  such  that  |7^(siJ(A;))|  =  1  or  |X(sij(A;))|.  Because  {S{.)} 
is  such  that  (6)  then  8(2:)  is  of  the  form  (4)  and  Ip  (.)  is 
a  contrast  over  the  set  of  white  random  vector  having  at 
most  one  null  cumulant  of  order  p  of  its  real  part.  • 

Hence  by  the  theorem,  for  white  random  vectors  y  deduced 
from  eq.(3),  necessary  and  sufficient  condition  for  blind  de- 
convolution  is 

AT  N 

i=l  i=l 

or 

N  N 

El<^p2:(3/0l  =  El^»’^(“‘)l- 

i=l  i=l 

This  leads  to  the  two  following  constrained  blind  deconvo¬ 
lution  criteria 

JV 

max  y^|Cp7^(p^)|  subject  to  y  white  (15) 

i=i 

N 

max  y^|CpJ(pt)|  subject  to  y  white  (16) 

t=i 

Now  in  the  specific  case  of  sources  Ui  with  identical  sign  Sp 
of  the  p-th  order  cumulant  of  7^(ai)  and  X{ai)  for  all  i,  we 
have  the  following  theorem. 

Theorem  2  For  even  integer  p  >  3,  the  functions 

N  N 

i=l  t=l 

are  contrasts  over  the  set  of  white  random  vectors  having 
non  zero  cumulant  of  order  p  of  its  real  and  imaginary  part. 

The  proof  is  easily  deduced  from  Theorem  1  and  eq.(lO) 
where  if  p  is  even  then  sign(Cp7?.(pi))  =:sign(Cp7^(ai))  =  Sp. 

If  we  consider  the  value  p  =  4,  we  have  the  following 
simplified  theorem. 

Theorem  3  The  functions 

N  N 

K^(y)^e4  J2E';^^?/i)  and  K%)  =  £4  E 

t=l 

are  contrasts  over  the  set  of  white  random  vectors  having 
non  zero  cumulant  of  order  4  of  its  real  and  imaginary  part. 

Proof:  We  only  consider  K^.  One  has  C47Z{yi)  =  E'R}{yi)— 
3E^lZ^{yi).  Since  white  vectors  are  considered  E'R?{yi)  is 
constant  Vi.  Thus  K^(y)  =  Jf  (y)  +  cst  where  cst  is  a 
certain  constant.  Then  the  theorem  is  proved.  • 

As  previously  we  can  deduce  necessary  and  sufficient  con¬ 
ditions  for  blind  deconvolution  and  the  corresponding  max¬ 
imization  criteria. 
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4.  SELF-ADAPTIVE  ALGORITHM  matrix  {S}  according  to 


In  order  to  achieve  the  deconvolution,  we  have  to  find  a  fil¬ 
ter  {H}  such  that  the  proposed  contrasts  are  maximum.  A 
stochastic  gradient  based  adaptive  algorithm  is  proposed  in 
this  section.  The  set  of  definition  of  the  proposed  contrast  is 
the  set  of  white  vectors.  Hence  in  the  following  we  consider 
that  a  first  stage  realize  a  multichannel  spectral  prewhiten¬ 
ing  of  the  observations.  This  “classical”  stage  will  not  be 
discussed  here.  In  order  to  ensure  the  whiteness  of  y,  {H} 
must  be  such  that 


H(z)H«(l)  =  I  (17) 

that  is  the  filtering  transfer  matrix  is  lossless  or  all-pass. 
Such  transfer  admits  a  special  parametrization  thanfe  to 
planar  (Givens)  rotations,  see  e.g.  [4].  In  the  simplest  case 
{N  =  2,  A;  =  0, 1)  one  has 


H(^)  =  Qi(^ 


Q2(^2j</>2)  (18) 


where 


Qi(0. 


i  \  _  /  e^'^'cosOi  siJiOi  \ 

y  -sin^i  e~^‘^‘cos0i  J 


(19) 


Using  this  parametrization,  we  have  now  to  find  the  angles 
Oi  and  <f>i  in  order  to  maximize  one  contrast.  Denoting  p 
anyone  of  parameters  {9i,<j>i),  a  deterministic  procedure  is 
to  reach  the  maximum  of  a  contrast  C  thanks  to  an  iterative 
algorithm  which  updates  p  with  the  increment 


Ap  =  (20) 

where  p  is  a  small  positive  constant.  Hence  the  optimum  is 
found  as  the  limit  of  the  sequence 


p(n)  =  p(n  -  1)  +  p  —  .  (21) 

^  p-p{n-l) 

In  cases  of  the  contrast  in  this  paper,  it  is  possible  to  express 
the  criteria  as  the  expectation  of  some  random  variable.  We 
use  a  loss  complex  version  of  the  gradient  algorithm  (21) 
by  dropping  the  expectation.  It  will  be  called  a  “stochastic 
algorithm”.  For  JV  =  2  and  contrast  K^(.),  one  easily  has 
the  stochastic  increment 

Ap  =  4ps4(7^"(l/l)^^  +  7^®(p2)^^)  (22) 

where  d'R>{y\)/dp  are  deduced  from  (2)  and  (18). 

Convergence  analysis  of  the  proposed  algorithm  is  be¬ 
yond  the  scope  of  this  paper.  However  computer  simula¬ 
tions  are  presented  in  order  to  demonstrate  that  the  pro¬ 
posed  algorithm  works. 


5.  COMPUTER  SIMULATIONS 

The  performances  of  the  algorithm  are  associated  to  an  in¬ 
dex/measure  of  performance  defined  on  the  global  filtering 


ind({S})  ^ 


1 

2 


max|si^(m)P 

^,Tn 


+ 


This  positive  index  is  indeed  zero  if  {S}  is  such  that  S(z) 
satisfies  (6)  and  a  small  value  indicates  the  proximity  to 
the  desired  solution.  We  present  simulations  in  the  case 
of  two  sources.  Three  kind  of  sources  are  considered:  i) 
two  4-QAM  communication  sources;  ii)  two  16-QAM  com¬ 
munication  sources  and  iii)  two  constant  modulus  sources: 
exp{j(f>)  where  is  a  random  variable  with  imiform  proba¬ 
bility  density  over  [0,27r[.  The  mixing  filter  is  of  the  form 
(18)  where  0i  =  tt/G,  <I>i  =  7r/18,  O2  =  ir/9  and  <t>2  =  7r/36. 
The  algorithm  (22)  is  tested  via  Monte  Carlo  simulations. 
In  Fig.l,  2  and  3  we  have  plotted  the  sample  average  over 
500  data  realizations  of  the  index  as  a  function  of  iterations 
respectively  in  cases  i),  ii)  and  iii).  The  index  decreases 
monotically  and  achieve  the  steady  state  level  of  -33dB, 
— 27dB  and  —  28dB  respectively  in  the  three  cases.  In  Fig.4 
and  5  we  have  plotted  one  realization  of  the  performance 
index,  the  estimated  parameters,  the  observed  signals  and 
the  reconstructed  signals  at  channel  1  when  steady  state  is 
achieved. 
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Figure  1:  Sample  average  of  the  performance  index  for  two 
4- QAM  sources. 


Figure  2:  Sample  average  of  the  performance  index  for  two 
16-QAM  sources. 


Figure  3:  Sample  average  of  the  performance  index  for  two 
constant  modulus  sources. 
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Figure  4:  Performance  index  -h  parameters  -f  observed  sig¬ 
nals  -f  reconstructed  signals  for  two  4-QAM  sources. 
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Figure  5:  Performance  index  -f  parameters  +  observed  sig¬ 
nals  +  reconstructed  signals  for  two  16-QAM  sources. 
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Abstract-  Problems  of  separation  of  convolutive  mix¬ 
tures  of  wideband  signals  impinging  on  an  antenna  of 
sensors  often  arise  in  signal  processing.  In  seismic  signal 
processing,  techniques  have  been  developed  to  perform 
separation  of  seismic  waves  (f-k  or  median  filters,  spec¬ 
tral  matrix  filtering.  Radon  or  Karhunen-Loeve  trans¬ 
forms,  Maximum  Likelihood  methods).  They  give  good 
results  in  most  cases,  yet,  their  limits  might  occur  in  dif¬ 
ficult  contexts  (waves  of  very  close  energies  or/and  near 
slowness).  We  analytically  study  the  resolving  power  of 
spectral  matrix  filtering  to  theoretically  explain  why  the 
method  does  not  work  any  more  for  waves  of  close  ener¬ 
gies.  This  problem  brings  us  to  the  question  of  links  be¬ 
tween  two  basis :  eigenvectors  and  steering  vectors. 

Keywords-  Array  Processing,  Spectral  Matrix  Filtering, 
Seismic  Signal  Separation,  Blind  Processing  of  Wide¬ 
band  Signals. 

1.  Introduction 

Problems  of  separation  of  convolutive  mixtures  of 
wideband  signals  impinging  on  an  antenna  of  sensors 
are  widely  spread.  Typical  examples  can  be  found  in 
passive  sonar,  geophysics,  etc...  In  geophysical  opera¬ 
tions,  the  aims  of  signal  processing  are  the  separation 
and  the  identification  of  waves  to  get  a  better  under¬ 
standing  of  the  onshore.  Techniques  have  been  devel¬ 
oped  to  achieve  these  purposes  (I^rhunen-Loeve  trans- 
fonn  [5],  f-k  filter,  median  filter  [4],  spectral  matrix  fil¬ 
tering  [7,8],  Radon  transform  [2],  Maximum  Likelihood 
Estimator  [3]).  They  give  good  results  in  many  cases  but 
their  limits  might  occur  for  waves  of  very  close  energies 
or  too  near  slowness.  Focusing  on  spectral  matrix  filter¬ 
ing,  we  detenmne  its  resolving  power  analytically 
studying  links  between  two  basis  ;  on  the  one  hand  the 
eigenvectors  basis  which  is  the  mathematical  object 
given  by  the  eigendecomposition  of  the  spectral  matrix 
of  observed  signals  and  on  the  other  hand  the  steering 


vectors  basis  which  is  the  physical  object  we  are  inter¬ 
ested  in.  We  explain  how  these  two  basis  fit  together. 
This  fitting  depends  on  different  parameters,  yet,  our 
choice  was  to  express  results  versus  a  geometrical  crite¬ 
ria  (i.e.  the  spatial  coherency  of  waves  vectors)  and  the 
energy  ratio  of  the  sources. 

n.  Theoretical  background 
II.1.  The  model 

We  suppose  that  the  antenna  is  linear  and  composed  hy 
N  sensors.  The  signal  r/ft)  recorded  on  the  sensor  is  a 
linear  combination  of  the  p  detected  waves,  plus  an  ad¬ 
ditive  noise  [9].  This  noise  is  supposed  to  be  spatially 
and  spectrally  white,  gaussian  and  independent  of  the 
signals  of  interest.  Its  spectral  density  is  notedo-fc  .  These 
assumptions  are  written  in  the  time  domain  as  follows  : 

(0  (0  +*t(0  (1) 

i=i  j=i 

where  *  is  the  convolution  operator,  «,(/)  is  a  determi¬ 
nistic  amplitude  term  (referred  to  as  the  source  or 
wave-front) :  it  does  not  convey  information  about  the 
propagation,  Sk^it)  describes  the  propagation  of  the  /* 
wave  recorded  on  the  k*  sensor  and  bk(t)  stands  for  the 
noise.  The  transcription  of  (1)  into  matrix  notations 
gives : 

p 

R{t)=  S{t)*A(t)  +  B{t)=^Oj{t)  +  B(t)  (2) 

J=i 

using  following  notations : 

•  ^(0=[ri(0>  ",rv(0]^is  the  {N,l)  vector  of  the  ob¬ 
servations.  ^stands  for  the  transposition  operation. 

•  S(/)=|5,(0,  ”,5p(/)j  is  a  (A^p)  matrix  whose  A:*  col¬ 
umns  is  the  so-called  A*  steering  vector  expressed  as  : 
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The  phase  of  its  first  component  is  assumed  to  be  null 
which  implies  that  the  first  sensor  is  chosen  as  a  ref¬ 
erence.  This  convention  ensures  the  unity  of  the 
sources.  Besides,  these  steering  vectors  describe 
propagation  on  the  antenna.  Under  the  plane  waves 
assumption  with  neither  attenuation  nor  dispersion, 
the  complex  gain  between  two  sensors  reduces  to  a 
pure  phase  term.  But  in  the  general  case,  more  com¬ 
plex  phenomena  have  to  be  taken  into  account. 

•  Oj  =Sj(t)*aj  (t)  is  they*  wave  vector. 

Equation  (3)  is  obtained  Fourier  transforming  (2). 
The  general  problem  is  then  divided  into  a  set  of  prob¬ 
lems  of  separation  of  instantaneous  mixtures  of  signals. 
The  calculus  at  a  given  frequency  bin  does  not  depend 
any  more  on  those  made  at  other  frequency  bins. 

/?(v)=S(v).D(v),p-*(v)^(v)  +  B(v)  (3) 

S'(v)  A'iv) 

We  will  focus  on  the  problem  of  separation  of  colored 
but  uncorrelated  sources^.  The  diagonal  renormalisation 
matrix  D  we  have  introduced,  ensures  spectral  whiten¬ 
ing  of  the  sources  (i.e.  the  new  sources  i4'(v)  have  unit 
power).  Whatever  S’W,  matrix  of  the  new  steering  vec¬ 
tors,  its  Singular  Values  Decomposition  (SVD)  is  given 
1^112]: 

S'(v)=V(K).A^(v).n(v)  (4) 

where  V  is  a  unitary  (N,N)  matrix  (i.e.  V.V^  =  In  ;  ^  de¬ 
notes  transconjugaison  operator  and  In  is  the  (N,N)  iden¬ 
tity  matrix),  A  is  a  (N,p)  diagonal  matrix  whose  N-p  last 
lines  are  null  (it  is  obviously  supposed  that  N>p),  n  is  a 
(p,p)  unitary  matrix,  parametered  in  the  general  case  as  a 
product  of  Givens  rotation  matrices  (11’),  multiplied  by  a 
diagonal  matrix  of  pure  phase  terms  (P)  [12].  In  the  most 
simple  case  which  is  the  two  waves  case,  its  expression 
simplifies  to ; 


n(^,K:,((^„v^2)=n'.P= 


''  cos0(y) 

sin^v).e>'^(‘’>'| 

''gMly)  0  ^ 

COS0(v)  j’ 

^  0 

It  only  depends  on  four  parameters  which  vary  with  the 
frequency. 

II.2.  Eigendecomposition  of  spectral  matrix  and 
estimation  of  matrices  V  and  A 

To  analytically  determine  the  two  matrices  V  and  A  in¬ 
volved  in  the  parametrisation  of  S’,  we  build  the  spectral 
matrix  r(v),  related  to  the  observations  R(y)  and  defined 
as ; 


r(v)=4«.i?'']= 


V.(a  +  o^.I;,).V^ 


(6) 


where  Q  stands  for  an  averaging  operator.  Equation  (6)  is 
obtained  by  reintroducing  the  parametrisation  of  the 
propagation  matrix  S’  that  was  given  in  equation  (4).  It 
can  also  be  identified  with  the  eigendecomposition  of  the 
spectral  matrix  because  of  the  uniqueness  of  this  one. 
Thus,  eigendecomposition  enables  the  determination  of 
two  of  the  matrices  that  are  looked  for  :  the  p  first  col¬ 
umns  of  matrix  V  are  the  p  first  eigenvectors  of  matrix  F 
(assuming  that  eigenvalues  have  been  arranged  in  a  de¬ 
scending  way).  In  the  same  way,  the  p  largest  eigenvalues 
.1*  of  r  are  related  to  A.  In  fact,  we  have : 


0 


(7) 


The  eigenvectors  associated  with  the  p  largest  eigenva¬ 
lues  belong  to  the  same  subspace  (called  the  Signal  Sub¬ 
space  (SS;)  as  the  one  spanned  by  the  p  steering  vectors  of 
the  desired  waves  vectors.  Yet,  nothing  guarantees  the 
exact  fitting  between  these  two  basis.  This  is  obviously 
due  to  the  fact  that  the  unitary  matrix  11  also  involved  in 
the  equation  statement  is  not  reachable  by  this  own 
treatment.  We  can  even  notice  that  eigenvectors  define  an 
orthonormal  basis  whereas  steering  vectors  are  not  neces¬ 
sarily  orthogonal. 

In  next  sections,  we  explain  how  the  two  basis  fit  to¬ 
gether,  and  we  quantify  resolving  power  of  the  spectral 
matrix  filtering.  The  analytical  calculations  prove  that,  in 
most  cases,  treatments  based  on  exploitation  of  second 
order  properties  of  received  signals  are  not  sufficient  to 
separate  waves  but  enable  extraction  of  the  most  energetic 
one.  To  reach  separation,  treatments  have  to  be  com¬ 
pleted,  in  other  words  matrix  n  has  to  be  estimated.  In 
blind  separation  of  wideband  independent  sources,  this 
matrix  is  determined  using  the  fact  that  it  has  to  lead  to 
most  independent  sources  [6,1]  in  the  sense  of  a  higher 
order  criteria.  Blind  separation  of  seismic  waves  has  been 
performed  replacing  this  criteria  by  a  local  distance  sta- 
tionarity  criteria  applied  on  the  phases  of  the  estimated 
wave  vectors  [10,11]. 


m.  Spectral  matrix  filtering  :  resolving  power 

We  now  focus  on  the  case  of  two  uncorrelated  plane 
waves.  The  two  vectors  Vi(y)  and  V2(vJ  associated  with 
the  two  largest  eigenvalues  Aj  and  A2  have  to  be  analyti¬ 
cally  calculated.  To  reach  this  purpose,  we  exploit  the  two 
following  properties :  these  vectors  are  eigenvectors  of 
matrix  r(v)  (equation  8)  and  they  are  linear  combination 
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of  steering  vectors  because  of  their  belonging  to  the  SS 
(equation  9) ; 


s'(v)=v.A^^^= 


ii 

> 

0 

0 

Fj  ( v)—c■^S{  +  C2S2 

V2{v)=d,S[^d2S{ 


(9) 


where  c, ,  C2 ,  ,  d^  are  complex  numbers. 

This  set  of  hypothesis  leads  to  the  following  system  : 

\c2{X,-al-Pa2)^c,S[^.S[ 

where  Pat  =  SJ  S,'  =  ||  S,'  |p.  To  solve  this  system,  differ¬ 
ent  cases  have  to  be  distinguished ; 

(i)  Waves  are  geometrically  orthogonal  (i.e.  5,'.  52^=0) 
but  sources  have  different  energies,  then  the  eigenvector 
which  is  associated  with  the  largest  eigenvalue  is  collin- 
ear  to  the  steering  vector  of  the  most  energetic  wave,  and 
the  eigenvector  associated  with  the  second  eigenvalue  is 
coUinear  to  the  steering  vector  of  the  less  energetic  wave. 
This  appears  in  eq.  (11).  The  treatment  is  completed  at 
the  end  of  the  second  order  stage  to  the  extent  that  the 
found  basis  already  coincides  with  the  wanted  basis  : 

U^=af  +  Pa,  ;  V,=(l/^).S{ 

\ji  2=  +  Pa^  ;  F2  =  (1  /  ^).S^ 


(11) 


(ii)  The  case  of  orthogonal  waves  with  the  same  energy  is 
a  singular  one.  Eigenvalues  are  found  to  be  always  identi¬ 
cal.  Whatever  the  vector  belonging  to  the  space  spanned 

steering  vectors,  it  is  an  eigenvector.  The  system  al¬ 
ways  remains  undetermined... 

(iii)  We  now  suppose  that  the  waves  are  not  orthogonal.  It 
can  be  easily  established  that  the  two  largest  eigenvalues 
of  the  spectral  matrix  are  given  by  : 


A  condition  about  c,  ,C2  is  deduced ; 

(Pa^-  Pa,}+^(pa,-  Pa^)" 

We  obtain  the  same  kind  of  relation  for  dj,d2.  These  two 
ratios  are  representative  of  the  geometrical  organization 
between  the  two  considered  basis.  The  transformation 
which  ensures  the  passing  from  one  basis  to  the  other  one 
is  the  multiplication  by  a  compression  matrix  (A)  and  a 
unitary  matrix  expressed  as  a  complex  rotation  matrix. 
In  the  two  waves  case,  it  becomes  : 


Conditions  on  coefficients  cj,  c;,  di,  d:  are  deduced 
from  this  last  equality  : 

^2  ^  )• 

%  (12) 

Thus  we  have  to  parameter  the  unknowns.  Uniqueness 
of  this  parametrisation  is  ensured  by  the  normalisation  of 
the  eigenvectors ; 

arg(c,)+(/i=0 

Brg(d2)  +  1/^2  =0  K=!tt%{d2)  - argfcn) 

|c,|  =  cos0/y  -  op 


and : 


|c2|  =  sm5/y  -  op 
|</i|  =  sin^/^A2  -  op 
\d2\=cos0/  ^^2  -  op 


^z:> 

^i/ 

/<^\ 

=tan^ 


-'A 


tan^ 


We  now  quantify  the  dependency  of  angles  0  and  /c  of 
the  unitary  matrix  on  parameters  of  interest.  In  our  case, 
the  two  desired  angles  are  expressed  versus  E  the  energy 
ratio  of  the  sources  {E=PayTai)  and  the  spatial  coher¬ 
ency  p  between  the  two  waves,  p  is  the  normalized  scalar 
product  between  steering  vectors  (it  is  a  geometrical  crite¬ 
ria).  In  the  case  of  plane  waves,  with  equispaced  sensors, 
we  have ; 


p{v)  = 


Sf{y).S'2{V) 


1  sin(A^.A<D(v))  ^/(A^-l)AO(v) 


with :  AO= 


The  steering  vector  of  the  A'*  wave  is  defined  in  the  fre¬ 
quency  domain  : 


1 


In  the  plane  wave  case  with  equispaced  sensors,  the 
propagation  matrix  has  a  special  structure  :  it  is  a 
Vandermonde  matrix.  The  time  delay  on  /w*  sensor  is 
then  given  by : 

where  d  is  the  distance  between  two  sensors,  c  the  propa¬ 
gation  velocity  of  the  sound  and  9k  the  angle  of  arrival 
on  the  antenna  of  the  wave. 

The  module  of  the  spatial  coherency  varies  between  0  and 
1  ;/?=  0  for  geometrically  orthogonal  waves.  It  becomes 
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trae  if  the  number  of  sensors  is  great  and  the  angles  of  ar¬ 
rival  are  different ;  /?=  1  for  collinear  waves.  Finally,  we 

find  that ; 


and  K=y/.^  -  y/^  -  iN - !)• 

It  is  also  possible  to  get  the  expression  of  eigenvectors, 
which  will  make  it  possible  to  quantify  the  resolving  po¬ 
wer  of  the  spectral  matrix.  We  have  established  that ; 

Power  of  Si  on  V,  f  ^  Power  of  S2  onVz 

Power  of  S2  on  Vi  (tan^)^  Power  of  S]  on  Vj 

Waves  of  identical  energy  characterize  a  singular  case 


because  angle  9  does  not  depend  on  spatial  coherency  any 
more.  It  remains  equal  to  45°  (figure  1).  Moreover  it  is 
the  less  favorable  one  in  terms  of  separation  (o  the  extent 
that,  after  the  second  order  stage,  sources  still  remain  to¬ 
tally  mixed  (the  same  proportion  of  each  somce  on  both 
whitened  signals  (figure  2)).  In  the  case  of  orthogonal 
waves  (spatial  coherency  coefficient  equals  0),  angle  6 
remains  equal  to  0°  (separation  is  achieved  after  simple 
projection  onto  eigenvectors).  In  all  other  cases,  the  sepa¬ 
ration  is  still  not  performed  after  the  second  order  stage, 
but  on  the  first  eigenvector  :  proportion  of  the  most  ener¬ 
getic  source  is  widely  superior  to  the  proportion  of  the 
least  energetic  source.  In  spite  of  the  fact  that  second 
somce  is  less  energetic,  its  proportion  remains  superior  to 
the  proportion  of  most  energetic  source,  as  far  as  the  se¬ 
cond  eigenvetor  is  concerned. 

Angle  6  vemts  the  energy  ntlo  of  the  tources  & 


the  nintJsil  coherency  of  the  wave  vectors 


Figure  1  :  Variations  of  the  angle  0  versus  energy  ratio  and  spatial 
coherency  of  the  waves 


In  this  work  we  explain  how  the  basis  of  steering  vec¬ 
tors  and  eigenvectors  fit  together  and  how  this  fitting  de¬ 
pends  on  different  parameters  such  as  the  energy  ratio  of 

waves  and  their  spatial  correlation  degree. 
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Abstract 

Recents  studies  have  shown  that  provided  spatial  or 
temporal  diversity,  blind  identification  /  equalization  is 
perfectly  achievable  under  some  conditions  on  the  chan¬ 
nel  transfer  function  and  amount  of  data  considered. 
However,  in  the  presence  of  channel  noise,  equalization 
can  no  longer  be  achieved  perfectly.  We  study  the  best 
achievable  linear  equalizer  performances  in  terms  of  the 
input  /  output  minimum  mean  square  error  (MMSE), 
defining  the  channel  equalizability  as  a  function  of 
the  multichannel  transfer  function  roots  and  the  signal 
to  noise  ratio  (SNR).  We  show  that  a  channel  dis¬ 
parity  lower  bound  can  be  deduced  as  a  function  of 
the  SNR  in  order  to  achieve  a  given  amount  of  MMSE. 


Keywords:  Fractionally  spaced  /  multichannel 
equalization,  channel  disparity. 

1.  Introduction 

Equalization  is  a  crucial  part  of  digital  communi¬ 
cation  systems  [1].  The  way  equalization  is  imple¬ 
mented  is  a  trade-off  between  reaching  high  perfor¬ 
mances  and  computation  cost.  In  particular,  the  equal¬ 
izer  length  determines  the  computation  need.  However, 
it  must  be  chosen  carefully  so  to  guaranty  the  perfor¬ 
mances  required  by  the  remaining  parts  of  the  system. 
The  usual  one-input  /  one-output  channel  equalization 
problem  is  known  to  require  an  equalizer  length  pro¬ 
portional  to  the  inverted  channel  impulse  response,  the 
value  of  which  is  prohibitive  for  short  FIR  channels. 
One-input  /  multiple-outputs  equalization  induced  by 
channel  diversity  was  recently  shown  to  be  perfectly 
achievable  within  a  finite  length  equalizer,  ([2],  [3],  [5], 
[6]....)  under  some  Zero- Forcing  (ZF)  conditions  to 
be  recalled  later.  First,  we  quantify  equalization  best 
achievable  performances  when  the  propagation  is  dis¬ 
turbed  by  additive  channel  noise.  Performances  are 


measured  by  the  input  /  output  minimum  mean  square 
error  (MMSE).  We  investigate  then  the  links  between 
this  lower  bound  and  a  measure  of  channel  diversity. 
This  should  provide,  for  a  given  amount  of  channel 
noise,  a  measure  of  constraint  on  the  channel  so  to 
equalize  to  a  given  amount  of  MMSE.  It  should  help  in 
evaluating  the  benefit  of  additional  diversity  with  re¬ 
spect  to  the  equalizer  length  to  be  used.  Such  a  bound 
should  also  be  very  useful  in  order  to  compare  algo¬ 
rithms  and  criterions  performances. 

2.  Spatio-Temporal  Equalization 


iL'iCn) 


s(rj) 


/ 

\ 


y{n)  =  5(rj  -  i/) 


tyL(n) 

Figure  1.  Noisy  Fractionally-Spaced  Equalization 
scheme 


The  one-input  /  multiple-outputs  channel  model 
(c(z)  =  {ci{z),  is  a  well  suited  formal¬ 

ism  for  spatial  diversity  (i.e.,  a  sensors  array)  as  well 
as  temporal  diversity  (i.e.,  sampling  the  received  sig¬ 
nal  at  an  higher  rate  than  the  emitted  sequence), 
see  [4].  The  induced  equalization  problem  consists  of 
choosing  a  T- variate  equalizer  transfer  function  e(z)  = 
...jei{z))^  such  that 

2/(n)  =  [e(^)^]r(n)  =  e'ril(n)  (1) 

estimates  at  best  s(n  -  i/),  as  in  Figure  1.  R{n)  is 
the  observation  regression  vector  containing  the  N  peist 
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observations  of  r{n),  v  is  the  channel-equalizer  de¬ 
lay.  Assuming  c(^)  is  a  polynomial  vector  of  degree  Q, 
equation  (1)  can  be  rewritten  as 

y{n)  =  e^CS{n)  -f  e^W{n) 

where  S{n)  =  {s{n)^s{n  —  1), s{n  —  Q  —  N  + 
contains  the  A'-hQ  past  observations  of  s(n).  W{n)  is 
defined  alike  R{n),  C  is  a  AL  x  (A  -h  Q)  (Sylvester) 
channel  convolution  matrix  defined  by  the  taps  of 
the  degree  Q  multivariate  transfer  function,  c(z),  as  in 

[4]. 

ZF  is  defined  here  in  terms  of  the  channel  transfer 
function  invertability  as: 

e^{z)c{z)  =  z-'' 

and  is  guaranteed  under  the  following  conditions  ex¬ 
pressed  in  terms  of  C.  Under  the  ZF  conditions  (i.e., 
jV  —  1  >  Q  and  no  common  roots  to  all  subchannels 
transfer  functions),  C  is  full-column  rank  so  that  any 
channel-equalizer  global  impulse  response  /i  =  C^e  is 
achievable,  in  particular  the  ZF  hi,  corresponding  to 
So  that  in  the  absence  of  noise,  these  conditions 
allow  perfect  identification  of  the  channel  transfer  func¬ 
tion  and  perfect  equalization,  i.e.,  y{n)  =  s(n  —  i/). 

However,  as  soon  as  there  is  a  ”  non-negligible 
amount”  of  channel  noise,  perfect  equalization  can 
no  longer  be  performed  even  if  the  channel  impulse  re¬ 
sponse  is  exactly  identified.  In  particular,  if  the  noise 
is  filtered  by  a  ZF  equalizer,  (the  transfer  function  of 
which  will  be  calculated  as  function  of  the  channel 
transfer  function  poles  and  zeros  locations),  it  may  be 
enhanced  so  that  the  signal  to  noise  ratio  (SNR)  at  the 
equalizer  output  is  reduced. 

Example  1:  the  roots  of  ci(z)  and  C2{z)  are  respec- 
tively  0.59,  ^.1  and,  0.6,  1.3,  ZF  may  be  achieved  for 
A  =  2  and  the  equalizers  corresponding  to  the  differ¬ 
ent  possible  delays  have  their  norm  displayed  in  the 
following  table: 


1/ 

0 

1 

2 

3 

INI 

250.0 

250.0 

326.9 

25.8 

So  that  the  noise  may  be  enhanced  by  a  factor  of  more 
than  250  at  the  equalizer  output,  reducing  all  the  more 
the  output  SNR  and  performances.  Of  course,  some 
algorithms  do  not  try  to  achieve  ZF  but  a  trade-off  in 
terms  of  performances  between  ZF  and  noise  enhance¬ 
ment,  see  [8]  for  instance. 

In  order  to  design  robust  algorithms,  i.e.,  so  to  bal¬ 
ance  noise  enhancement,  we  need  to  better  understand 
what  induces  it.  This  goal  motivates  the  following 
study  of  the  channel  roots  locations  effect  on  the  equal¬ 
ization  performances. 


3*  MMSE 

When  the  channel  is  affected  by  additive  white  noise 
(independent  from  the  source  sequence),  a  measure 
of  achievable  direct  linear  equalization  performance  is 
given  by  the  minimization  of  the  input-output  normal¬ 
ized  MSE, 

E  [(?/(n)  s{n  -  i^))^]  /E  [s^]  =  \\h  -  hu\f  +  (2) 

under  the  constraint  h  =  C'^e,  where  7  = 

E  [w^]  /E  [s^]  is  the  noise  to  signal  ratio. 

Under  the  ZF  conditions,  any  value  of  the  NL  long 
vector  h  is  achievable  so  that  the  minimization  of  (2)  is 
proved  ([8])  to  correspond  to  =  C(C^C)“^/i^  with 

ft^  =  (/  +  7(C^C)-^)“^/i^ 

which  is  all  the  more  distinct  from  ZF  than  7  is  high. 
The  resulting  MMSE  is  expanded  in  terms  of  7  as: 

CO 

771  =  0 

Such  an  expansion  is  valid  for  small  enough  values  of 
7,  precisely  when  ||7(C'''C)“^||  1. 

Note  that  when  a  channel  identification  method  is 
used,  the  maximum  likelihood  estimator  of  the  input 
sequence  in  the  presence  of  white  gaussian  noise  in¬ 
duces  an  input  /  output  MSE  equal  to  jhJ  {C^ C)~^  hi, 
when  the  channel  is  perfectly  estimated.  Namely,  it  is 
equal  to  the  minimal  value  of  7||e|p  constrained  to  ZF, 
i.e.,  hi,  =  C^e.  It  appears  this  expression  of  MSE  is 
the  first  order  approximation  of  the  preceding  MMSE 
full  expression.  It  will  be  denoted  as, 

MMSE(7)  =  7||e|p  =  7/1J  (3) 

The  optimal  delay  u  is  therefore  chosen  so  to  mini¬ 
mize  hj {C^ C)~'^hi,.  This  value,  which  is  bounded  by 
the  inverses  of  the  extremal  eigenvalues  of  C^C,  is  the 
one  of  interest  here. 

4,  A  Measure  of  Channel  Disparity 

Under  the  ZF  conditions,  the  invertability  of  C^C 
is  given,  for  L  =  2,  by: 

det{c'^C)^Knij\z{-zif  (4) 

where  H  stands  for  product,  K  is  a  polynomial  function 
of  (1|ci||2  +  |1c2|P)^  with  llcfclp  =  Ei=o,Q^*(*)^  and  z^ 
is  the  root  i  of  subchannel  k  =  1,2.  Note  that  it  is 
quite  difficult  to  fully  express  A,  even  using  Sylvester 
resultant  results  ([9])  and  symbolic  calculus. 
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(4)  allows  to  connect  the  ZF  condition  ”no  common 
roots”  to  the  identification  /  equalization  performances 
given  by  (3).  In  particular,  (4)  shows  also  how  close 
subchannels  roots  can  create  important  noise  enhance¬ 
ment  as  in  Example  1  where  one  root  difference  is  0.01, 
From  there,  we  can  define  a  measure  of  channel  dispar¬ 
ity  by  7/ Klii  j  which  is  all  the  more  important 

than  zeros  of  different  subchannels  get  close  to  each 
other.  However,  one  should  note  that  a  small  value  for 
det(C^C)  does  not  necessarily  imply  a  large  value  for 
MMSE(7),  see  Example  1  for  z/  =  3. 

The  question  of  interest  here  is: 

When  are  two  ^^numerically  close”  roots  so  close  that 
there  is  ”lack  of  disparity”  ? 

We  propose  to  study  the  minimum  distance  between 
the  two  closest  roots  allowing  "disparity” .  Thus,  we  de¬ 
fine  there  is  channel  lack  of  disparity  when  the  ac¬ 
tual  equalization  MMSE  is  better  approximated  by  the 
MMSE  value  obtained  by  considering  the  two  roots  as 
equal  than  by  the  value  deduced  from  (3).  Otherwise, 
i.e.,  when  the  actual  MMSE  is  better  approximated  by 
(3) ,  we  say  that  the  channel  presents  spatio-temporal 
diversity. 

4.1.  Non-achievable  ZF 

To  be  able  to  quantify  a  bound  of  disparity,  we  need 
to  look  at  the  extreme  case  when  there  are  mathemat¬ 
ically  equal  roots. 

In  a  previous  contribution  [7],  we  have  shown  that 
when  there  are  common  roots  to  all  subchannels  (re¬ 
ferred  to  as  lack  of  disparity),  C  is  no  longer  full 
column-rank  but  can  be  factored  as  a  product  of  two 
Sylvester  matrices  as  Avhere  is  full  column- 

rank,  and  Co  full  row-rank.  Cq  is  the  convolution  ma¬ 
trix  associated  to  co(z)  which  is  formed  by  the  Zq  com¬ 
mon  roots.  C  is  associated  to  the  remaining  multichan¬ 
nel  transfer  function,  c(z).  In  that  case,  ZF  is  no  longer 
achievable  and  the  closest  achievable  equalizer  to  the 
ZF  hi,  IS  h  =  Cj(CoCj)""^Co/ii/  =  Hoht/  which  is  the 
projection  of  on  the  range  of  the  non- full  column- 
rank  cj. 

The  MMSE  must  then  to  be  calculated  using  the 
previous  factorization  of  C.  In  the  contrary  of  the  case 
of  ZF  conditions,  h  is  no  longer  an  unconstraint  param¬ 
eter  since  it  has  to  lay  in  the  range  of  C  J .  The  only  un¬ 
constrained  parameter  to  be  used  for  the  minimization 
is  also  e  such  as  h  =  cj e  and  e  =  C^e,  The  optimal 
value  of  e  for  a  given  e  is  also  e(e)  =  C(C^C)'’^e.  So 
that  the  MMSE  is  obtained  by  minimizing  over  e: 

I|cje-/*^||2  +  7||C(CTC)-Ie||2 


We  can  thus  deduce  a  first  order  approximation  in 
terms  7  of  the  MMSE,  MMSEo(7)  = 

||(/-no)/i^||2  +  37hJCo-^(C'’c)-i(Co-^)^h,  (5) 

with  =  Cj(CoCj)-i. 

To  define  the  disparity  bound,  we  want  to  con¬ 
sider  the  conditions  where  ||7(C^C)“^||  1  and 

||(C^C)-1||  >  1.  Thus,  MMSE(7)  in  (3)  must  be 
compared  to  the  0  order  approximation  in  terms  of  7 
of  MMSEo(7).  Therefore,  we  can  define  the  disparity 
bound  as: 

7hJ(C^C)-i/i^  =  hJ(/-no)h^  (6) 

where  u  and  //  are  the  value  minimizing  the  expression 
in  which  they  are  involved. 

Example  2:  In  order  to  look  at  the  lack  of  diversity 
bound,  let  us  first  consider  the  simplest  case  of  two 
subchannels  given  by  Ck{z)  =  1  -  A:  =  1, 2.  De¬ 
noting  ^2  =  we  want  to  check  what  happens 

as  |e|  decreases  towards  0.  The  "numerical  border  of 
disparity”  occurs  when  the  MMSE  calculated  for  sup¬ 
posedly  distinct  roots  in  (3)  becomes  larger  than  the 
MMSE  calculated  for  these  roots  taken  as  equal  in  (5). 
With  N  ~  2  (which  is  a  large  enough  equalizer  length 
when  the  roots  are  distinct),  a  simple  formal  calculus 
leads  to  the  two  expressions, 

(3):  MMSE(7)  =  7/.(ei) A"  +  0(7/^), 

(5):  MMSEo(7)=5Afi)+^(l)- 
where  and  are  rational  bounded  functions,  the 
expressions  of  which  are  omitted  for  sake  of  space  and 
clarity. 


Figure  2.  Disparity  bound  v.s  root  location  and  SNR 
Next,  we  propose  a  disparity  bound: 

depending  on  the  signal  to  noise  ratio,  SNR  = 
-101og(7)  and  the  location  of  the  root  We  display 
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in  Figure  2  the  disparity  bound,  e,  i.e.,  the  distance  one 
should  ensure  between  two  ”  close  roots”  to  provide  dis¬ 
parity,  versus  their  location  ^  and  for  several  values  of 
SNR. 

Indeed,  the  greater  the  SNR  is,  the  closer  roots  can 
be  and  still  provide  disparity.  Note  that  the  bound  val¬ 
ues  can  become  so  large  that  disparity  is  not  possible. 
Furthermore,  it  appears  the  closer  is  to  1  (i.e.,  the 
unit  circle  when  generalizing  to  complex  numbers),  the 
smaller  the  disparity  bound  is.  It  means  that  diver¬ 
sity  is  all  the  more  important  that  the  channel  roots 
are  close  to  the  unit  circle.  This  result  is  crucial  since 
roots  close  to  the  unit  circle  is  a  very  difficult  condition 
for  equalization  when  there  is  no  diversity, 

4.2.  Additional  Diversity 

Another  important  question  is  whether  additional 
channel  diversity  improves  significantly  the  equaliza¬ 
tion  performance. 

We  also  want  to  extend  the  expression  of  det(C^C) 
to  the  case  of  L  >  3  to  fully  understand  the  condi¬ 
tion  ”no  comon  roots  to  all  subchannels”.  General¬ 
ized  Sylvester  resultant  calculus  (see  [9]  for  classic  re¬ 
sults)  lead  to  the  fact  that  the  rank  of  C^C  is  equal 
to  N  Q  -  Zo,  each  common  zero  reducing  the  rank 
by  its  multiplicity.  So  that  the  determinant  must  be 
expressed  by  a  weighted  sum  of  products.  Symbolic 
calculus  and  simple  examples  lead  us  to  suggest  the 
following  expression: 

det(C'"C)  =  -  zi? 

k<l 

where  Kk^i  is  some  polynomial  bounded  function  of  the 
subchannels  k  and  /.  This  measure  shows  the  possible 
gain  of  disparity  when  one  increases  the  disverity,  i.e., 
when  the  number  of  subchannels  is  increased  by  either 
spatial  or  temporal  diversity. 

Example  3:  To  explicit  this  gain,  let  consider  the  two 
subchannels  in  Example  2  to  which  we  add  a  third  one, 
C3{z)  =  Thus, 

det(C'^C)  =  K  ((6  -  6)"  +  (6  -  6)"  +  (6  -  6)^) 

It  appears  clearly  that  the  additional  subchannel  in¬ 
duces  disparity.  Note  that  the  additional  terms  in  the 
determinant  result  in  a  value  greater  than  this  for  the 
”best”  two  subchannels  combination. 

5.  Conclusion 

We  have  proposed  a  channel  disparity  bound  based 
on  the  channel  noise  power  and  multichannel  trans¬ 
fer  function.  It  explains  how  spatio-temporal  diversity 


may  induce  enough  disparity  to  allow  equalization  with 
a  finite  length  equalizer.  However,  ZF  equalization  may 
have  very  poor  performances  when  some  subchannels 
are  closer  than  the  disparity  bound  depending  on  the 
SNR  and  on  the  other  channel  roots.  The  effect  of  close 
roots  location  versus  the  unit  circle  was  also  studied 
and  results  in  diversity  being  all  the  more  important 
that  roots  are  close  to  the  unit  circle,  which  is  the  diffi¬ 
cult  case  in  monovariate  channel  equalization.  Further 
study  of  the  improvement  of  additional  diversity  is  un¬ 
dertaken. 

We  hope  these  simple  theoretical  results  will  help 
in  understanding  the  contribution  of  spatio-temporal 
diversity  in  more  realistic  channel  conditions. 
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Abstract 

This  paper  addresses  the  blind  identification  of  mul¬ 
tiple  input  multiple  output  (MIMO)  systems  with  the 
number  of  inputs  strictly  less  then  the  number  of  out¬ 
puts,  On  the  contrast  to  the  standard  FIR  modelling 
'we  assume  the  overall  channel  with  arbitrary  finite  or¬ 
der  rational  transfer  function.  Certain  quite  reasonable 
technical  hypotheses  allow  to  adapt  the  existing  linear 
predictton  and  subspace  based  approach  and  implement 
a  finite  order  zero-forcing  equalizer  in  the  noise-free 
case.  The  noise-free  condition  also  yields  a  simple  per¬ 
formance  analysis  which  is  quite  accurate  at  low  noise 
levels  and  provides  a  meaningf  ul  comparison  of  the  pro¬ 
posed  estimators.  The  robustness  to  additive  noise  is 
studied  here  by  computer  sim, illations  for  both  tech¬ 
niques. 

Keywords:  blind  identification,  equalization,  per¬ 
formance  anah^sis. 

1.  Introduction 

Convolntive  properties  of  the  propagation  media  is  a 
typical  shortcoming  in  various  applications.  In  digital 
communications  it  leads  to  severe  inter  symbol  interfer¬ 
ence  (LSI)  dramatically  reducing  the  channel  capacity. 
Many  recent  publications  consider  the  problem  of  blind 
dentification  i.e.  channel  evaluation  analyzing  the  out¬ 
put  observation  and  further  extraction  of  the  input  sig¬ 
nals.  Classical  approaches  to  single  input  single  out¬ 
put  (SISO)  identification  [1,  2]  usually  exploit  higher 
than  two  order  statistics  (HOS)  and  are  demanding  in 
sample  volume.  A  noticeable  improvement  has  been 
achieved  due  to  either  multiple  antennas  or  the  obser¬ 
vation  oversampling,  see  [3,  4].  Both  of  them  allow 
to  recast  the  problem  into  the  single  input  multiple 
output  (SIMO)  identification  where  the  second  order 
estimation  as  well  as  finite  length  zero-forcing  equali¬ 
zer  (ZFE)  are  available.  A  certain  part  of  these  re- 
sult.s  has  been  recently  generalized  for  the  MIMO  case, 
[5,  6].  Most  of  them  still  treat  the  finite  order  poly- 

^This  study  is  supported  by  CNET  (France  Telecom),  ENST 
and  partially  by  the  SASPARC  project  of  INTAS. 


nomial  transfer  functions  originating  from  multipath 
propagation  environment.  However  the  precise  analy¬ 
sis  of  some  secondary  phenomena  e.g.  mutual  coupling 
of  sensors/receivers  could  enable  certain  improvement 
through  a  more  sophisticated  channel  modelling  espe¬ 
cially  at  low  noise  levels. 

We  consider  here  rational  transfer  functions  usually 
satisfying  most  of  the  applications.  As  a  matter  of 
fact  a  matrix-valued  transfer  function  admits  many  dif¬ 
ferent  parametrizations  (Le.  corresponding  to  various 
canonical  forms,  see  [7]).  One  of  the  possible  solutions 
is  based  on  the  AR  factorization  of  the  actual  ARMA 
model  obtained  as  a  generalization  of  the  similar  re¬ 
sults  for  the  multivariate  FIR  case  [6].  An  alternative 
approach  originates  from  the  right  m at rix-fr action  de¬ 
scription  (MFD)  which  implies  a  two  step  identification 
procedure  with  the  MIMO  subspace  based  estimator  [5] 
followed  by  the  linear  prediction  of  a  reduced  size.  Both 
methods  yield  finite  order  causal  ZFE  providing  the  in¬ 
stantaneous  mixture  of  source  signals.  Such  a  mixture 
can  be  conventionally  treated  by  any  of  the  existing 
source  separation  techniques.  We  focus  here  on  the 
deconvolution  performance  i.e.  the  residual  ISI  at  the 
output  of  ZFE.  Further  asymptotic  analysis  given  in 
section  5  allows  to  establish  the  residue  ISI  variances 
which  appear  to  be  invariant  to  a  particular  channel 
realization.  We  further  compare  the  above  mentioned 
estimation  approaches  and  simulate  their  behaviour  in 
the  noisy  case. 

2.  Data  model  and  hypotheses 

Let  us  consider  M-variate  time  series 
being  the  output  of  some  M  x  m  linear  system  with 
rational  transfer  function  H{z)  = 
and  m-variate  input  series  The  sequences 

^.ddressed  in  the  sequel  as 
observation  and  excitation  satisfy  the  linear  equation 
written  in  the  operator  form: 

x(0  =  [ir(2r)]5(i),  fez.  (1) 

Here  each  entry  [H{z)]pq  may  be  interpreted  as  a  trans¬ 
fer  function  between  input  q  and  output  p.  In  the 
special  case  of  constant  transfer  functions  H{z)  =  H 
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we  have  the  instantaneous  mixture  separation  prob¬ 
lem.  On  the  other  hand,  one  deals  with  multichannel 
equalization  if  m  =  1.  In  the  most  general  case  of 

linear  processing,  we  look  for  a  MIMO  ZFE  Eh{z)  = 
YlT-o  associated  with  channel  H{z)  so  that 

EHiz)xlt)  =  sit)  Le,  EHiz)H{z)  =  U*  Moreover  it 
is  preferable  to  have  a  finite  order  polynomial  ZFE  for 
practical  applications.  Any  consistent  channel  estimate 

Hiz)  calculated  from  the  finite  observation  {x 
yields  a  sample  equalizer  Eh{^)  siiid  consistent  extrac¬ 
tion  of  the  input  signals 

=  iez.  (2) 

We  study  the  conditions  providing  the  existence  of 
Eh(z)  and  the  extraction  accuracy  subject  to  various 
channel  estimators.  Some  further  results  are  available 
under  the  following  hypotheses: 

HI  The  number  of  inputs  m  is  strictly  less  than  the 
number  of  outputs  M. 

H2  The  emitted  sequences  {$k  (OlteZ’  ^ 

are  statistically  independent  and  temporally  non- 
correlated  non-gaussian  series. 

H3  The  rational  matrix  H{z)  is  irreducible,  see  [7]. 

More  involved  analysis  of  (H3)  provides  quite  clear 
interpretation  in  terms  of  inter-channel  diversity  and 
shows  that  this  constraint  is  met  in  typical  applica¬ 
tions,  [8].  Meanwhile  this  assumption  plays  a  key  role 
for  channel  identification  and  signals  extraction  since 
it  ensures  the  existence  of  a  finite  order  ZFE. 

Lemma  1  Let  Hiz)  be  finite  order  rational  function 
satisfying  (H3).  Then  there  exists  finite  Ne  the 
associated  Eniz)  with  deg  (  Ejfiz)  )  =  Ne  such  that 
EHiz)Hiz)  =  lm^ 

One  can  presume  that  ZFE  is  not  uniquely  defined  un¬ 
der  (HI)  since  there  exists  a  non-trivial  left  null-space 
of  Hiz),  Our  reader  will  see  how  the  choice  of  ZFE 
can  be  adapted  to  the  particular  factorization  of  H (z) 
used  at  the  preceeding  channel  evaluation  stage. 

3.  Linear  prediction  approach 

This  kind  of  technique  has  been  recently  proposed 
in  mono-source  context  [9]  and  later  developed  for  the 
MIMO  case  of  FIR  channels,  [6].  We  propose  a  staight- 
forward  extension  based  on  the  following  property. 

Lemma  2  Let  Hiz)  be  a  finite  order  rational  func¬ 
tion  satisfying  (H3)-  Then  there  exists  finite  Np  and 

an  associated  P{z)  =  liw  +  PCO  verifying 

Piz)  Hiz)  =  H{0). 

In  the  other  words,  any  irreducible  ARM  A  channel  is 
also  an  AR  channel  of  finite  order.  Consequently  the 
equation  (1)  can  be  written  as  follows: 

[Piz)]xit)  =  KiQ)sit),  teZ.  (3) 


The  prediction  coefficients  P  =  [P(l), . .  .,P(Ap)] 
and  the  innovation  covariance  matrix  D  =  H(0)  H(0)^ 
may  be  consistently  estimated  by  solving  a  multivariate 
Yule- Walker  equation  with  the  empirical  counterpart  of 
the  block-Toeplitz  spatio-temporal  covariance  matrix: 

Rx  =  {R-4^)}^'=0.  =  E  {xit)xit  -  r)«}  .  (4) 

Some  more  details  on  calculating  P  and  D  the  consis¬ 
tent  estimates  of  P  and  D  can  be  found  in  [10].  Notice 
also  that  a  similar  estimation  procedure  is  valid  when 
the  observation  {x  (OltgTZ  corrupted  by  an  additive 
temporally  white  noise  with  known  spatial  structure 
since  a  consistent  estimate  of  is  still  available. 

Now  any  M  x  m  square  root  F(0)  of  D  such  that 
F(0)F(0)^  =  D  verifies  H(0)  =  F(0)  0  with  some 
unitary  m  x  m  matrix  ©.  Let  us  consider  a  FIR  fil¬ 
ter  Efiz)  =  F(0)^P(>2r).  According  to  (4),  we  have 
[EFiz)]xit)  =  Qsit)  i.e.  Epiz)  is  a  kind  of  ZFE 
providing  the  instantaneous  mixture  of  source  signals. 
Due  to  (H2).  further  extraction  of  each  source  sig- 
nsil  Skit)  can  be  completed  by  HOS  source  separa¬ 
tion  techniques.  Let  us  denote  by  Q  some  consistent 
estimate  of  the  m  x  m  separator  obviously  verifying 
limT—oo  Q  =  0^-  We  finally  apply  a  finite  order  Np 
FIR  filter 

EHiz)^QtiO)*Piz)  (5) 

to  the  observation  series  as  indicated  in  (2).  Due 
to  the  strict  consistency  of  each  empirical  quantity, 
limT^oo  H^)  =  «(^)- 

4.  Generalized  subspace  approach 

This  method  stamps  from  the  canonical  right  MFD 
of  rational  functions.  Let  us  denote  by  the  column 
space  of  Hiz)  i.e.  <S(z)  =  span  {  Hiz)  }.  In  the  most 
general  case  one  can  deduce  the  following  result,  see 
[11]  for  definitions. 

Lemma  3  Let  (H3)  hold  and  lei  polynomial  M  x  m 
matrix  Biz)  be  any  minimal  polynomial  basis  (MPB) 
ofSiz)  with  invariant  column  degrees  deg  (  [Biz)]k  )  = 
Bk,  El  Lm‘  Than  there  exists  mxm  poly¬ 

nomial  C'(z)  of  finite  degree  Nc  and  full  rank  almost 
everywhere  in  (D  such  that  Hiz)  ~  Biz)  Ciz)^^. 

According  to  lemma  3,  the  identification  procedure 
can  be  accomplished  now  in  two  steps:  (i)  iden¬ 

tify  any  MPB  of  5(2:);  (ii)  identify  the  associated 
C'(z).  Let  us  focus  on  the  first  stage.  We  denote 
by  {Biz),  C{z)}  any  arbitrary  pair  satisfying  lemma  3 
i.e.  Hiz)  =  Biz)Ciz)~^.  Now  the  observation  se¬ 
ries  can  be  rewritten  as  x(^)  =  [5(z)]t;(t),  where 
vit)  =  [Ciz)~'^]  sit).  Notice  that  5(z)  is  some  MPB  of 
5(z)  and  v(i)  has  a  full-rank  covariance  matrix  R^  of 
any  order  (see  definition  (4)).  As  indicated  in  [5],  one 
can  perfectly  identify  some  MPB  of  5(z)  from  a  finite 
observation  sample.  Such  a  MPB  may  be  consistently 
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estimated  in  the  noisy  case.  For  more  details  concern¬ 
ing  the  estimation  of  a  particular  MPB  we  address  the 
reader  to  [12]. 

Let  us  denote  by  B{z)  a  consistent  estimate  of  some 
MPB  B{z).  As  shown  in  [11]  any  MPB  matches  (H3) 
and  according  to  lemma  1  there  exists  a  finite  order 
Eb{z)  yenfying  Eb{z)  B{z)  =  In  practice  such 
a  left  inverse  of  B{z)  can  be  calculated  from  the  al¬ 
gebraic  analog  of  (1):  [.t?(^)^, . . . ,  -  Nb)'^]'^  = 

Jnb (E)  ^ ,  • .  • ,  v{t  —  Nb  —  ,  where  (B) 

is  a  generalized  Sylvester  maii-ixoi  order  Nb  associated 
with  the  polynomial  B(z),  see  [7].  The  input  signal 
s(/)  can  be  extracted  by  applying  m  x  M{Nb  +  1)  ma¬ 
trix  Eb  ^  [E5(0),...,EB(Ar^)]  to 

when  Nb  >  YJk-i  In  this  case  Eb  is  cal¬ 
culated  from  the  left  pseudo-inverse  of  by  tak¬ 

ing  its  m  upper  rows.  Its  consistent  estimate  Eb  can 
be  readily  obtained  using  the  empirical  quantity  B{z) 
instead  of  the  true  one.  Further  pre-filtering  provides 
the  intermediate  output  signal 

v(t)  ~  [EB(z)\x{t),  ^1^  i;(^)  [C{z)~'^]s(i).  (6) 

The  described  preliminary  processing  forms  the  kernel 
of  the  generalized  subspace  approach  since  it  allows  to 
reduce  the  initial  problem  to  the  identification  of  m  x  m 
polynomial  matrix  C{z).  One  should  notice  that  due 
to  perfect  identification  of  MPB  B(z)  in  the  noise-free 
case  the  intermediate  output  v{t)  =  [Eb{z)]  x{t)  safis- 
fies  v(t)  =:  generalized  subspace 

estimator  is  statistically  equivalent  to  the  estimation  of 
C(z)  from  the  series 

Let  us  consider  the  identification  of  C{z).  Due  to 
(H3),  its  zero  coefficient  C(0)  :=  lim^^oo  C{z)  is 
nonsingular  and  we  can  define  matrix  A(z)  =  I 

such  that  A{z)  =  C(0)“^  6'(2;).  It  is 
easy  to  check  that  and  s{t)  verify  the  following 
equation 

M(^)]«W  =  C(0)-is(f)  (7) 

i.e.  process  with  the  predic¬ 

tion  coefficients  A  =  [A(l), . . . ,  A(Ac)]  and  the  the 
innovation  covariance  matrix  D'  =  C(0)“^.  Obvi¬ 
ously  further  identification  of  model  (7)  may  be  ac¬ 
complished  by  means  of  the  linear  prediction  approach 
described  in  section  3.  Now  v(t)  is  treated  as  the  ob¬ 
servation.  Let  us  denote  A  the  esimate  of  A,  F'(0)  the 
empirical  square  root  square  root  of  D'  and  Q'  some 
m  X  m  separator  estimate.  Similarily  to  (5)  we  obtain 
.s(<)  =  Q’F'iO)-^[Aiz)]vit).  Finally 

EH(z)~Q'F'iO)-^Aiz)EBiz),  (8) 

the  complete  ZFE  can  be  found  by  plugging  t>(i)  from 
(6).  This  filter  provides  the  consistent  extraction  of 
source  signals  from  the  observation  according  to  (2) 
i.e.  limx^co  ♦5(^)  =  s{t),  similarily  to  (5). 


5.  Performance  analysis 

In  this  section  we  compare  statistical  efficiency  of 
both  identification  techniques  in  the  noise-free  case.  As 
it  follows  from  section  4,  the  generalized  subspace  es¬ 
timator  allows  perfect  identification  of  factor  B{z)  i.e. 
the  only  error  is  caused  by  the  estimation  of  C{z)  e.g. 
linear  prediction.  Therefore  we  just  need  to  compare; 

(i)  extraction  (2)  via  linear  prediction  according  to  (5); 

(ii)  extraction  of  source  signals  from  v{t)  via  linear  pre¬ 

diction  according  to  the  model  (7).  For  this  purpose 
we  use  some  general  asymptotic  results  concerning  sig¬ 
nals  extraction  via  linear  prediction  in  the  noise-free 
case.  Let  be  a  M- variate  time  series  satis¬ 

fying  the  AR  equation  [V{z)]  y{t)  -  ns{t)  with  V{z) 
any  prediction  filter  of  order  not  more  that  N-p  and 
some  M  X  m  full  rank  matrix  77.  We  assume  further 
identification  procedure  described  in  section  3  and  the 

associated  estimate  E(z)  =  YlTLo  of  ZFE  de¬ 

fined  according  to  (5)  with  Q  providing  the  consistent 
signals  estimate  s{t)  ~  [E{z)]y{t).  As  a  matter  of 
fact  consistent  extraction  yields  that  the  global  causal 
transfer  function  f{z)  =  E{z)V{z)-^n,  such  that 
s{t)  -  [/'(2:)]5(^),  verifies  limT-.co  Eiz)  ~  Im  i  e.  its 
Fourier  coefficients  match  limT^oo  r(r)  =  0  for  r  >  0. 
The  equalization  errors  defined  as  the  residue  convolu- 
tive  contribution  to  s{t):  ^s{t)  =  E{t)  s{t  —  r) 

will  be  considered  throughtout  this  paper  as  a  perfor¬ 
mance  index  for  the  deconvolution  techniques  taking 
into  account  that  the  residual  separation  error  essen¬ 
tially  depends  upon  the  source  separation  technique. 

Theorem  1  The  asymptotic  equalization  errors  /^s{t) 
verify  limT-^oo  T  IE  { rim  f  where  r 
is  the  rank  of  Hy  of  order  Np  —  1. 

Notice  that  the  equalization  errors  variances  later  ad¬ 
dressed  as  the  equalization  rates  are  asymptotically  in¬ 
variant  to  the  instantaneous  separation  performance 
as  well  as  to  the  system  parameters,  they  depend  only 
upon  the  rank  of  R^.  We  further  denote  r  =  rfRy^N) 
the  rank  of  having  the  order  A  —  1,  N  >  0.  Let 
E^Vmin  be  the  minimum  prediction  order.  Then  for 
any  Np  >  Npmin  we  have  rfRy.Np)  >  r{Ry,Npmin) 
i.e.  order  overestimation  always  leads  to  performance 
degradation.  To  compare  the  potential  efficiency  of 
both  methods  one  certainly  needs  to  know  minimum 
orders  Np  and  Nc^  these  latter  non-trivially  depend- 
ing  upon  Hi^z).  More  involved  study  of  this  quanti¬ 
ties  for  typical  channel  realizations  is  being  currently 
investigated.  On  the  other  hand,  in  the  case  of  un¬ 
known  Np  and  Nc  one  can  estimate  channel  via  both 
of  the  designed  methods  if  the  observation  window 
Np  is  chosen  sufficiently  large.  More  precisely,  we  as¬ 
sume  that  the  window  of  linear  prediction  approach 
and  subspace  method  are  chosen  equal  to  Np.  The 
corresponding  number  of  the  observable  input  samples 
including  each  source  signal,  equal  to  Np  A-  h  (see 
section  4),  provides  us  with  the  window  for  the  sec¬ 
ondary  linear  prediction  e.g.  identification  of  C{z).  It 
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is  easy  to  show  that  7’(Rx,  N^p)  =  mN'p  +  Y^=i 
r(R.  ,  N'p  -\-Li)  =  m{Nr  +  Li)  i.e.  the  performance  of 
the  generalized  subspace  technique  is  not  worse  than 
the  performance  of  the  pure  linear  prediction.  The 
equality  holds  only  if  Li  =  . . .  =  Lm,  when  m  =  1. 

6.  Simulations 

We  present  in  this  section  some  numerical  exam¬ 
ples  validating  theoretic  results  on  both  estimators  in 
the  noise-free  case  and  their  robustness  to  the  addi¬ 
tive  noise.  The  overall  propagation  channel  has  been 
modelled  as  a  left  MFD  form:  H{z)  =  A{z)~^B{z) 
so  that  [A{z)]x(t)  =  {13{z)]s{t),  In  digital  multi¬ 
sensor  communication,  B{z)  reflects  the  FIR  propa¬ 
gation  media  between  the  user  and  the  reception  cite 
while  A{z)  might  correspond  to  the  mutual  coupling 
of  receivers.  We  compare  linear  prediction  and  gen¬ 
eralized  subspace  approaches  completed  by  joint  di- 
agonalization  source  separation  procedure,  see  [13]. 
For  system  dimensions  M  =  A,  m  =  2  and  ob¬ 
servation  sample  size  T  =  500,  we  plot  the  residue 
equalization  rates  versus  the  degree  of  denominator 
A{z)  for  deg  (  B{z)  )  =  2,  the  degree  of  the  first  col¬ 
umn  of  numerator  deg  (  [B{z)]i  ),  deg  (  [B{z)]2  )  =  0, 
deg  (  A{z)  )  =  2  and  versus  the  average  signal-to-noise 
ratio,  deg  (  A{z)  )  =  deg  (  B{z)  )  =  2.  Each  simulated 
value  is  equiped  with  the  confidence  interval  of  ±  2  em¬ 
pirical  standard  deviation  and  the  true  value. 
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As  it  follows  from  Fig.l,  the  actual  performance  of 
two  methods  might  be  slightly  different  in  favour  of 
the  subspace  technique.  On  the  other  hand,  this  latter 
displays  fatal  degradation  in  the  presence  of  noise,  the 
values  from  lOdB  to  2bdB  lead  to  the  abnormal  error 
in  many  cases  even  when  the  degrees  Lk  are  perfectly 
known ^  identification  of  these  quantities  being  a  partic¬ 
ular  problem  in  the  noisy  case.  Meanwhile  the  linear 
prediction  approach  appears  to  be  robust  at  high  and 
moderate  signal-to-noise  ratio  levels. 
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Abstract 

Subspace  based  estimates,  i.e.  estimates  obtained  by  exploit- 
ing  the  orthogonality  between  a  set  of  vector  statistics  and  a 
set  of  parameter- dependent  vectors  have  gained  much  pop¬ 
ularity  in  the  signal  processing  litterature.  The  purpose  of 
this  contribution  is  to  develop  a  general  theory  for  such  es¬ 
timates.  We  in  particular  discuss  the  generalization  of  the 
optimal  weighted  subspace  fitting  approach,  introduced  by 
Viberg  [2]  in  the  DOA  estimation  context.  We  then  establish 
that  the  optimally  weighted  estimate  enjoy  some  invariance 
properties 

1.  SUBSPACE  FITTING  ESTIMATION 

This  section  is  concerned  with  general  properties  of  sub¬ 
space  fitting  estimates:  we  define  a  general  framework  and 
give  the  asymptotic  performance  of  (optimal)  subspace  es¬ 
timates. 

1.1.  Assumptions  and  notations 

The  common  strand  behind  subspace  estimation  is  to  ex¬ 
ploit  the  geometrical  prope  rty  of  a  certain  matrix- valued 
statistics  for  estimating  unknown  parameters.  May  be  the 
most  well-known  example  of  such  techniques  is  the  so-called 
Pisarenko’s  method,  which  makes  use  of  the  eigen-subspace 
of  a  certain  covariance  matrix  to  estimate  the  frequency  of 
the  sine-waves  in  white  noise.  These  methods  have  gained 
much  popularity  in  the  signal  processing  community  in  the 
last  decade,  and  have  been  applied  successfully  to  a  variety 
of  problems,  such  as  the  estimation  of  direction-of-arrivals 
in  narrow-band  array  processing  [1,  2],  or  more  recently  in 
system  identification  [3,  4,  5].  As  seen  below,  this  theory 
can  be  formulated  in  fairly  general  terms. 

Consider  a  parametric  statistical  model  where  the  distri¬ 
bution  of  n  observations  yi,. . .  ,yn  depends  on  a  parameter 
vector  w  =  eO  C  JR*',  0  €  ©  C  Hi',  M  €  IR*”',  virhere 
0  is  a.  compact  subset  of  IR*.  Here,  6  is  the  parameter  of 
interest  and  p  is  a.  nuisance  parameter  (the  values  of  p  are 
needed  to  make  inferences  about  0  even  through  they  have 
little  informative  import  of  their  own). 

A  matrix- valued  statistic  Nn  €  is  computed  from 

yi,. . .  ,yn^  This  statistic  forms  the  basis  for  inferring  the 
parameter  of  interest  0.  Actually,  we  make  no  assumption 
on  the  distribution  of  the  data  themselves,  but  only  on  the 
asymptotic  distribution  of  Nn: 

Assumption  1  (Asymptotic  normality)  For  all  w  G 
0,  Nn  is  asymptotically  normal  with  asymptotic  mean  N(u) 
and  asymptotic  covariance  matrix 


This  assumption  uses  the  following  convention:  A  sequence 
Yn  of  random  r  x  q  matrices  is  said  to  be  asymptotically 
normal  with  asymptotic  mean  Y  and  asymptotic  covariance 
matrix  C  if  the  rqxl  random  vector  v^(Vec(y„)  — Vec(Y)) 
tends  in  distribution  to  a  zero-mean  random  vector  with 
correlation  matrix  C.  We  write:  Yn  ~  AA/’(Y,  C). 

Subspace  fitting  estimation  is  relevant  when  Nn  converges 
to  a  rank  deficient  matrix  N{lo)  and  when  it  exists  a  matrix¬ 
valued  5(6?)  G  depending  only  on  the  parameter  of 

interest  and  satisfying  the  following  assumption. 

Assumption  2  (Identifiability)  For  anyu)  =  {0,  p)  G  O, 

S'^{d')N{w)  =  Q  ^  6' =6.  (1) 

Hence  the  basic  mechanism  of  subspace  fitting  which  con¬ 
sists  in  obtaining  an  estimate  ^  of  ^  such  that  the  columns 
S{B)  are  ‘as  orthogonal  as  possible’  to  the  columns  of  Nn 
as  detailed  in  next  section.  Note  that  we  do  not  require 
that  Spa.n(S(0,  p))  and  Span(A(^))  are  orthogonal  comple¬ 
ments. 

The  following  notational  conventions  hold  throughout. 
First,  bold  face  letters  will  denote  values  of  functions  of 
9  taken  at  the  ‘true  value’  of  the  parameters.  In  particular, 
we  denote 

S  =  S{e),  li  =  N{0,p).  (2) 

It  is  often  needed  to  collect  derivatives  of  matrix  valued 
functions  w.r.t.  0  into  a  unique  larger  matrix.  A  suggestive 
notation  is  needed  for  this  construction.  We  wiU  typically 
denote: 

1=1  =  . ''«<||)i.  m 

[S’'N]  isf  [V«(||’'n) . v«(||’'n)1.  (4) 

where  all  the  quantities  at  evaluated  at  point  0  or  (0,p). 
Since  matrix  S{9)  has  size  r  xp,  matrix  [S]  has  size  rpxl.  Fi¬ 
nally,  with  several  asymptotically  normal  matrix  sequences 
appearing  in  the  following,  it  will  be  convenient  to  note 
C  =  Cov(y)  whenever  Yn  ~  ^^(Y,  C)  under  the  distri¬ 
bution  w  =  (9,  p). 

This  study  being  restricted  to  regular  (root-convergence) 
estimation,  we  impose  some  regularity  to  functions  5(-)  and 
N{-)  and  also  want  to  exclude  cases  where  some  linear  com¬ 
bination  of  the  parameters  can  be  estimated  at  a  super- 
efficient  rate. 

Assumption  3  (Regularity)  Functions  5(  )  and  A(  ) 
are  differentiable  with  respect  to  9  at  point  uj  =  (9,  p)  and 

Span([S^N])  C  Span(Cov(S^Wn)).  (5) 
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Loosely  speaking,  this  regularity  assumption  means  that 
there  is  no  direction  in  the  parameter  space  which  is  not 
excited  by  the  matrix  errors  S^iVn. 

Definition  1  A  pair  (S(-),Nn)  is  said  to  be  admissible  for 
subspace  fitting  estimation  if  it  satisfies  assumptions  1-3. 

Before  proceeding,  we  stress  that  it  is  not  assumed  that 
Span(N)  0  Spanfs)  —  IR"'  neither  that  N  or  S  have  full 
column  rank.  If  it  holds  that  Span(N)  0  Span(S)  =  IR’^, 
we  say  that  a  'saturation  condition^  is  fulfilled.  In  this  case, 
additional  properties  of  subspace  estimates  can  be  obtained 
(see  section  2.4). 

1.2.  Subspace  fitting  estimates 

Subspace  fitting  estimates  are  obtained  as  the  minimizers  of 
a  criterion  quantifying  the  orthogonality  between  the  range 
space  of  Nn  and  the  range  space  of  S{r]): 

e'il'  =  argmin  115^(»/)iV„||^  (6) 

where  W  is  (possibly  rank  deficient)  symmetric  non¬ 
negative  matrix  and  ||  •  ||vy  is  the  Weighted  Euclidean  norm, 
||A/||^^  =  Vec(M)^WVec(M).  For  a  given  admissible  pair 
(5(-),  Nn)^  weighting  matrix  W  must  be  chosen  to  preserve 
the  identifiability  of  assumption  2  :  A  matrix  W  is  said  to 
be  admissibleioi  the  value  {B,  p)  of  the  parameters  if  it  pre¬ 
serves  identifiability  i.e.  if  ||5^(^^) p)\^  =  0  9^  B 

With  this  definition,  we  can  state  the  following  theorem. 

Theorem  1  // (S(-),  A77,)  is  an  admissible  pair  and  matrix 
W  is  admissible  at  point  {B,  p)  for  this  pair,  then  B^  defined 
by  (6)  is  a  consistent  estimate  of  B. 

Of  course,  any  fixed  positive  definite  matrix  W  is  admissible 
for  any  value  of  the  most  straightforward  choice  being  the 
identity  matrix.  As  shown  below,  in  the  context  of  interest, 
'optimal’  weighting  matrices  are  often  rank  deficient  (in  or¬ 
der  to  null  out  'spurious’  error  terms)  and  the  null  space  of 
the  weighting  matrix  will  generally  depend  on  B.  This  fact 
may  appear  problematic  since  the  value  of  B  is  unknown. 
However,  one  may  show  that  a  consistent  estimate  of  the 
optimal  weighting  matrix  can  be  used  without  affecting  the 
asymptotic  performance. 

Weighted  subspace  estimates  are  asymptotically  charac¬ 
terized  as  follows. 

Theorem  2  //  (S(-),  Nn)  is  an  admissible  pair,  then  for 
any  admissible  weighting  matrix  W,  the  sequence  }  of 
estimates  admits  the  stochastic  expansion:  B^  =  -h 
op{n~^^^)  with 

t-r  =  ([S^N]^VF[S^N])-'[S^N]^prVec(S^iV’„).  (7) 

It  follows  that  ~  AAf(6,Cw)  with  asymptotic  covari- 

ance  matrix: 

Cw  =  ([S^N]^IV[S^N])-‘[S^N]^  W 
X  Cov(S^iVn)l^  [S^N]  ([S^N]^PF[S^N])-\  (8) 

Covariance  matrix  Cw  depends  on  the  choice  of  the  weight¬ 
ing  matrix  W.  This  raises  the  issue  of  an  optimal  choice  of 
the  weighting  matrix,  i.e.  the  existence  of  an  optimal  ma¬ 
trix  Wir  such  that  Cw*  <  Cw  for  all  admissible  W,  this 
inequality  being  understood  in  terms  of  the  partial  order¬ 
ing  of  the  Hermitian  matrices.  The  following  lemma  allows 
to  conclude  easily  about  optimality. 


Lemma  1  LetQ  andV  be  two  matrices  with  the  same  num¬ 
ber  of  rows;  Q^Q  invertible  and  T  a  non  negative  symmet¬ 
ric  matrix.  //Span((5)  C  Span(r),  then  for  any  symmetric 
matrix  W  such  that  Q^WQ  is  invertible,  it  holds  that 

iQ^WQ)-''  Q^WrWQ  >  (Q^T*Q)-\  (9) 

This  is  a  classic  inequality,  holding  unconditionally  when 
matrix  T  is  full- rank.  However,  our  purpose  requires  to 
deal  with  possibly  singular  matrices  F;  As  shown  by  the 
lemma,  the  inequality  stills  holds  provided  the  range  of  F 
is  ‘large  enough’.  Straightforward  application  of  lemma  1 
with  r  =  Cov(S^An)  and  Q  =  [S^N]  yields  the  following 
optimality  theorem. 

Theorem  3  For  an  admissible  pair  (5(*),  Nn)  and  any  ad¬ 
missible  weighting  matrix  W  Cw  >  ( J^’^)  where 

jS,N  [s^N]^Cov*(S^JV„)[S^N]  (10) 

and  this  lower  bound  to  the  asymptotic  covariance  is  reached 
for  Coy*  (S'^Nn). 

A  few  comments  are  in  order.  First,  this  result  obviously 
parallels  the  theory  of  maximum  likelihood  estimation  in 
regular  statistical  models.  Here  plays  the  role  of  the 

Fisher  information  matrix  in  the  M.L.  framework.  It  appar¬ 
ently  depends  not  only  on  the  statistics  of  Nn  but  also  on 
the  particular  function  S  used  to  express  the  orthogonality 
between  subspaces.  Next  section  is  devoted  to  establishing 
that  this  lower  bound  does  not  depend  on  functions  S  and 
N  but  only  on  'subspace  quantities’.  Second,  it  must  be 
stressed  that  it  may  exist  many  different  weighting  matri¬ 
ces  attaining  the  lower  bound.  In  other  words,  Cw  =  Cw^ 
does  not  necessarily  imply  that  W  =  W*.  Finally,  the  op¬ 
timal  weighting  depends  on  the  parameter  B  but  may  be 
shown  that  its  substitution  by  a  consistent  estimate  does 
not  affect  the  asymptotic  distribution  of  B^* . 

2.  INVARIANCE  OF  SUBSPACE  FITTING 
ESTIMATES 

This  section  is  devoted  to  establishing  two  invariance  prop¬ 
erties  related  to  optimal  subspace  fitting.  The  basic  intu¬ 
ition  is  that  optimal  procedures  ‘tend’  to  be  independent  of 
specific  parameterizations.  For  instance,  estimation  based 
on  optimal  matching  of  a  statistic  is  invariant  under  invert¬ 
ible  transformation  of  the  statistic.  Since  subspace  fitting 
estimation  is  ultimately  based  on  orthogonality  between 
spaces,  it  may  be  expected  that  the  behavior  of  optimal 
estimates  is  governed  only  by  ‘subspace  quantities’. 

2.1.  Pseudo-scores 

Our  approach  is  based  on  first-order  stochastic  expansions 
of  estimates:  If  a  sequence  {^n}  of  estimates  of  a  parameter 
B  can  be  written  as 

Bn  =  B  Tn^Un  +  Op(n  ^^^),  (ll) 

where  {un}  ^  AAf{0,  Ft^),  then  {un}  is  said  to  be  a  sequence 
of  pseudo-scores  for  {^n}-  This  terminology  is  clearly  in 
analogy  to  classical  M.L.  estimation  theory.  It  is  easily 
found  that  if  a  sequence  {Bn}  admits  a  sequence  of  pseudo¬ 
scores  {un}  AAf{0,  Fti),  then  [Bn]  ^  AM{B,  r^^).  In  the 
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following,  we  shall  establish  invariance  properties  by  com¬ 
paring  pseudo-scores.  Clearly,  if  two  estimators  are  associ¬ 
ated  to  pseudo-scores  differing  only  by  a  op(n~^^^)  term, 
they  have  the  same  asymptotic  distribution. 

Pseudo-scores  of  optimally  weighted  subspace  fitting  es¬ 
timates  have  a  characteristic  form: 

Theoi'em  4  The  optimally  weighted  subspace  fitting  esti¬ 
mate  based  on  an  admissible  pair  (S(>)^Nri)  admits  a  se¬ 
quence  of  pseudo-scores  given  by 

=  [S^N]^Cov#(S^JV„)Vec(S^7V„).  (12) 

This  form  of  the  pseudo-score  suggests  that  matrix  S  may 
factor  out,  leaving  an  expression  of  the  pseudo-score  not 
depending  explicitly  on  function  5(-).  A  key  device  for  this 
kind  of  manipulation  is  the  following  lemma. 

Lemma  2  Let  denote  an  asymptotically  normal  se¬ 

quence:  {t'Ti}  ~  J4A^(0,  Cov(t;)).  For  two  matrices  A  and  B 
ivith  compatible  dimensions^  sequences  and 

(13) 

=  A^B{B'^Coy{v)B)*B^v,  (14) 

are  equivalent,  i.e.  +  op(n~^^^),  if  the  following 

two  conditions  hold: 

rank(B^Cov(t;))  =  rank(Cov(i;)),  (15) 

Span(v4)  C  Span(Cov(n)).  (16) 

2.2.  Invariance  with  respect  to  the  probe 
We  start  by  proving  invariance  with  respect  to  the  probing 
function  5(*).  Denote  Ps(v)  the  orthogonal  projector  onto 
Span(^(?/))  and  denote  Ps  its  value  at  the  true  parameter 
n  =  0: 

Ps(v)  ='  S{g)  (S(gfSir,))*S{gf,  Pg  =  Psie).  (17) 

W'e  want  to  relate  estimates  based  on  the  pair  (5(*),A„) 
to  estimates  based  on  the  pair  {Ps(’),  Nn).  The  first  prop¬ 
erty  to  be  established  is  that  estimates  based  on  the  pair 

(Ps(-),Nn)  do  exist.  This  is  guaranteed  by  the  following 
theorem. 

Theorem  5  If  Nn)  is  admissible,  so  is  (Ps(-),A„). 

It  follows  that  theorem  4  applies  to  the  optimally  weighted 
subspace  estimate  based  on  the  pair  {Ps{’),Nn).  Accord¬ 
ing  to  theorem  4,  a  pseudo-score  associated  to  the  pair 
(PsD.Nrq)  is: 

=  [PsN]^Cov#(P5iV„)Vec(PslVn).  (18) 
It  is  not  difficult  to  establish  that 

[S^N]  =  (/®S)^[P5N],  (19) 

Vec(S^iV„)  =  {I®SfVec{PsNn].  (20) 

Note  that  inserting  identities  (20)  and  (19)  into  expres¬ 
sion  (12)  of  results  in 

=  [PsN]^(/  ®  S)  ((/  ®  S)^Cov(P5iV„)(7  ®  S))* 

iI®SfVec(PsNn).  (21) 


which  would  be  identical  to  expression  (18)  of  if 

matrix  /  ®  S  canceled  out  in  (21).  That  such  a  cancelation 
occurs  is  not  a  priori  granted  because  matrix  /  (g)  S  is  not 
invertible.  However,  one  may  prove  that  for  an  admissible 
pair  (5(-),  Nn)' 

rank((/  0  S)'^ Cov{P sNn))  =  rank(Cov(PsAn)(22) 
Span([PsN])  C  Span(Cov(PsiV„)).  (23) 

Thus  the  technical  conditions  required  to  apply  lemma  2 
are  fulfilled  :  matrix  /  0  S  does  cancel  out  in  (21)  and  we 
can  conclude  with  this  theorem. 

Theorem  6  (Invariance  w.r.t.  the  probe) 

Optimally  weighted  subspace  fitting  estimates  based  on  the 
pairs  [S{'),Nri)  and  {Ps{'),Nn)  have  equivalent  pseudo¬ 
scores,  i.e.  +op(n’'^/^). 

2.3.  Invariance  with  respect  to  the  statistic 
We  turn  to  invariance  w.r.t.  the  statistic.  We  define 

P^{q)  N{q)  {N{nfN{r,))*N{gf,  P^  P,,(e). 

(24) 

The  ‘sample  projector’  Pn  must  be  defined  via  a  S.V.D.  be¬ 
cause  the  rank  of  7V„  is  not  necessarily  equal  to  the  rank  of 
Pjv.  Thus,  if  pN  is  the  column  rank  of  N,  matrix  Pjv  is  de¬ 
fined  as  the  orthogonal  projector  onto  the  space  spanned  by 
the  pN  most  significant  left  singular  vectors  of  iV„.  Without 
strengthening  our  assumptions,  we  can  establish  admissibil¬ 
ity  of  the  pair  (5(-),  Pn). 

Theorem  7  If  Nn)  is  admissible,  so  is  (S(‘),  Pn). 

Thus  theorem  4  applies  to  the  optimally  weighted  subspace 
estimates  based  on  the  pair  {S[‘),  Pn)’  they  are  associated 
to  a  pseudo-score  given  by: 

=  [S^Pw]^Cov*(S^Pjv)Vec(S^Piv).  (25) 

Contrary  to  invariance  w.r.t.  the  probe  5(-),  an  additional 
‘rank  condition’  is  required  to  actually  obtain  invariance 
w.r.t.  Nn. 

Definition  2  We  say  that  the  rank  condition  is  fulfilled 
when  it  holds  that 

S^Nn  =  S'^N„N*N  +  op(n-^/^).  (26) 

One  can  show  that  the  rank  condition  holds  in  several  famil¬ 
iar  contexts.  For  instance,  it  is  verified  if  matrix  N  has  full 
column  rank  or  if  matrix  Nn  has  (almost  surely)  the  same 
rank  as  its  limiting  value  N.  Under  the  rank  condition,  we 
can  prove 

[S^N]  =  (N®/)^[S^Ph],  (27) 

VeciS'^Nn)  =  (N®  J)^Vec(S^P;^)  +  op(n-'/'')(28) 

Inserting  relations  (27)  and  (28)  in  (12)  yields 

^(s.N)  ^  ^  ^  /)^Cov(S^Piv)(N  ®  7))^*^ 

X  (N  ®  7)^Vec(S^P;,,)  +  op(n-^>^) 


X 


which  reduces  to  (25)  if  matrix  N  0  cancels  out  in  (29). 
For  an  admissible  pair  (5(-),^n),  we  can  prove  under  the 
rank  condition  that 

rank((N  0  /)^Cov(S^Pn))  =  rank(Cov(S^PivX29) 
Span([S^Pjv])  C  Span(Cov(S^Piv)).  (30) 


Thus  the  technical  conditions  required  to  apply  lerrima  2 
are  verified  :  matrix  N  0  /  does  cancel  out.  We  obtain: 


Theorem  8  Under  the  rank  condition,  optimally  weighted 
subspace  fitting  estimates  based  on  the  pairs  (5'(-),iVn) 
and  (5(-),  Pat)  have  equivalent  pseudo-scores:  — 


2.4.  Discussion 

We  now  combine  the  results  of  previous  sections.  In  partic¬ 
ular  the  pair  {Psi')^  Pn)  is  admissible  and  under  the  rank 
condition,  the  optimal  subspace  fitting  estimate  based  on 
(Ps{-),Pn)  admits  a  pseudo-score  which  is  equiv¬ 
alent  to  and  It  also  follows  that  the 

asymptotic  covariance  matrix  of  subspace  fitting  estimates 
based  on  any  of  the  pairs  {S{-),Nn),  {Ps('),  Nn)y  ('5(0?  Pp^) 
and  (Ps{’)^  Pn)  is  lower  bounded  by  Jp^  where: 


Jp  [FlpNfCoV^{PsPN)[PlPNl  (31) 

We  have  thus  completed  a  first  part  of  our  invariance  pro¬ 
gram  in  showing  that  only  subspace  quantities  are  relevant 
to  subspace  fitting.  The  next  step  is  to  consider  the  case 
when  the  spaces  spanned  by  S  and  N  are  complementary: 
if  SiO)  and  iV(6>)  taken  altogether  span  the  whole  space  i.e. 

Pn{0)  +  Ps{0)=-I,  (32) 

we  say  that  the  saturation  condition  is  met.  One  can  then 
express  pseudo-scores  and  information  matrices  in  terms  of 
quantities  pertaining  to  only  one  of  the  two  (complemen¬ 
tary)  subspaces.  For  instance,  one  easily  find  that 

Jf  “J:'  [P%-PsfCov*(PsPs)[P'^sPs]  (33) 

where  P‘^  —  I  —  Pn‘  The  saturation  condition  makes  it 
possible  to  compare  subspace  fitting  to  subspace  matching 
as  is  done  next. 

3.  RELATION  TO  SUBSPACE  MATCHING 

We  consider  optimal  estimation  based  on  Pn  (subspace 
matching)  and  its  relationship  to  subspace  fitting  estimates 
studied  in  previous  sections.  The  saturation  condition  is 
assumed  to  hold  throughout  this  section  since  no  close  re¬ 
lationships  between  the  two  approaches  can  be  expected  to 
be  found  otherwise. 

Statistic  matching  estimates  are  obtained  as 

=  argmin  \\Pn  -  PNiv^v-  (34) 

»7 

For  appropriate  choices  of  weighting  matrix  V ,  statistic 
matching  estimates  are  Sv)  for  some  covariance  ma¬ 

trix  S\/.  Theory  of  statistic  matching  can  easily  be  adapted 
to  the  current  context  in  spite  of  the  fact  that  Gow[Pn)  nec¬ 
essarily  is  rank  deficient.  In  particular,  one  finds  that  the 


asymptotic  covariance  matrix  is  lower  bounded  by  the 
inverse  of  matrix  Jm' 

Jm  =  [PNf  Cox*  {Pn)[Pn]  (35) 

whose  relation  to  Jf  is  given  below.  It  is  well  known  that 
under  appropriate  regularity  conditions,  is  an  asyrnp- 
totic  lower  bound  to  any  estimate  of  0  obtained  as  a  function 

of  Pat. 

We  set  out  to  comparing  subspace  matching  estimates 
based  on  Pn  to  subspace  fitting  estimates  based  on  the 
pair  {Ps{'),Pn)- 

Theorem  9  Under  the  saturation  condition  (32),  for  any 
admissible  weight  W,  B^  —  for  V  =  (Pjv0 

VsYW{Vn®Ps). 

Thus  we  see  that  any  subspace  fitting  estimate  using  weight 
W  is  equivalent  to  some  subspace  matching  estimate  with  a 
weight  V  which  is  simply  related  W .  The  converse  property 
is  more  difficult  to  establish  because  not  any  matrix  V  can 
be  put  in  the  form  mentioned  in  theorem  9,  However,  we 
do  we  have 

Theorem  10  Under  the  saturation  condition  (32),  for  any 
admissible  matrix  V,  6^  =  6^ op{n-^^^)  for  some  admissi- 
ble  weight  W. 

Unfortunately,  space  is  lacking  to  describe  how  the  weight 
W  relates  to  U  in  th.  10.  A  direct  consequence  of  the  pre¬ 
vious  two  theorems  is 

Theorem  11  Under  saturation  condition  (32),  Jm  = 

This  theorem  shows  that  optimally  weighted  subspace  fit¬ 
ting  estimates  are  asymptotically  ‘efficient’  under  satura¬ 
tion  in  the  sense  that  their  asymptotic  performance  reaches 
the  lower  bound  set  by  optimal  subspace  matching  estima¬ 
tion. 
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Abstract 

Most  conventional  techniques  for  Independent  Compo¬ 
nent  Analysis  (ICA)  resort  to  second-order  statistics  to 
decorrelate  the  observed  data.  The  prewhitening  step  makes 
these  algorithms  sensitive  to  the  presence  of  additive  Gaus¬ 
sian  noise.  In  this  paper  a  higher-order-only  technique  is 
presented.  The  identification  problem  is  approached  in  a 
(linear  and  multilinear)  algebraic  framework:  our  deriva¬ 
tion  starts  with  the  observation  that  the  solution  can  be  ob¬ 
tained  from  the  Canonical  Decomposition  (CANDECOMP) 
of  a  higher-order  cumulant  tensor.  Next,  it  is  demonstrated 
that  the  CANDECOMP  components  follow  from  the  simul¬ 
taneous  diagonalization,  by  congruence  transformation,  of 
a  set  of  matrices.  A  reformulation  in  terms  of  orthogonal 
unknowns  leads  to  a  simultaneous  Schur  decomposition, 
which  is  solved  by  a  Givens-type  iteration.  The  technique 
can  be  considered  as  the  higher-order-only  equivalent  of 
the  popular  JADE-algorithm. 


1  Introduction 

The  basic  statistical  model  for  Independent  Component 
Analysis  (ICA),  or  Blind  Source  Separation,  is  in  this  paper 
denoted  as; 

r  =  MX  +  A'  (1) 


This  research  was  partially  supported  by  the  Belgian  Program  on  In¬ 
teruniversity  Attraction  Poles  (lUAP-17,  IUAP-50),  the  European  Commu¬ 
nity  Research  program  ESPRIT,  Basic  Research  Working  Group  nr.  6620 
(ATHOS),  the  Flemish  Institute  for  Support  of  Scientific-Technological 
Research  in  Industry  (l.W.T.)  and  is  part  of  a  Concerted  Action  Project 
of  the  Flemish  Community,  entitled  “Model-based  Information  Processing 
Systems”.  Lieven  De  Lathauwer  is  a  Research  Assistant  supported  by  the 
l.W.T.  Bart  De  Moor  is  a  Research  Associate  of  the  National  Fund  for  Sci¬ 
entific  Re.search  (N.F.W.O.)  The  scientific  responsibility  is  assumed  by  the 
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in  which  the  observed  vector  F,  the  source  vector  X  and  the 
noise  vector  N  are  zero-mean  random  vectors  with  values 
in  K  or  C  The  components  of  X  are  mutually  statistically 
independent,  as  well  as  statistically  independent  from  the 
noise  components.  The  goal  of  ICA  now  consists  of  the  es¬ 
timation  of  the  transfer  matrix  (or  “mixture  matrix”)  M  and 
the  corresponding  realizations  of  X ^  given  only  realizations 
ofF. 

Without  a  priori  knowledge  the  ICA-problem  cannot 
be  solved  using  only  second-order  statistics.  Usually  the 
second-order  statistics  of  the  observation  vector  F  are  used 
for  a  whitening  of  the  data.  In  this  way  the  transfer  matrix 
can  be  estimated  up  to  an  orthogonal  factor  U.  In  the  second 
step  U  is  then  obtained  from  higher-order  cumulants  of  the 
standardized  data.  Several  algorithms  have  been  presented 
in  literature.  Among  the  most  well-known  approaches  are 
the  one  by  Comon  [3]  (further  analyzed  in  [7]),  where  U 
is  computed  by  a  Jacobi-type  diagonalization  of  the  stan¬ 
dardized  cumulant  tensor,  and  the  JADE-algorithm  (Joint 
Approximate  Diagonalization  Estimation)  by  Cardoso  and 
Souloumiac  [2],  where  U  is  found  as  the  solution  of  a  si¬ 
multaneous  eigenvalue  decomposition. 

In  our  paper  the  problem  is  solved  using  only  the  higher- 
order  cumulant.  This  approach  has  the  advantage  that  it 
is  conceptually  blind  for  the  noise  term  N,  when  this  term 
is  Gaussian.  For  simplicity  of  notation,  the  exposition  in 
this  summary  is  restricted  to  fourth-order  processing  of  real¬ 
valued  data.  The  technique  can  be  applied  to  cumulants  of 
any  order  (higher  than  2),  as  well  as  to  complex  data. 

The  paper  is  organized  as  follows.  In  the  next  section 
the  relation  between  the  columns  of  M  and  the  fourth  order 
observation  cumulant  is  explicited.  This  relation  takes  the 
form  of  a  tensorial  decomposition  of  the  cumulant  in  a  sum 
of  symmetric  rank-1  tensors,  and  the  uniqueness  of  this  de¬ 
composition  is  discussed.  In  Section  3  the  estimation  of  the 
transfer  matrix  from  the  cumulant  model  is  presented  as  a 
simultaneous  congruence  transformation.  In  Section  4  it  is 
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explained  how  the  problem  can  be  reformulated  in  terms  of 
unknown  orthogonal  matrices.  This  leads  to  the  simultane¬ 
ous  Schur  decomposition  of  Section  5,  for  which  a  Givens- 
type  computation  scheme  is  derived.  Section  6  contains  a 
concluding  discussion. 

2  Canonical  Decomposition 
2.1  Model 


2.3  Uniqueness 

The  uniqueness  properties  of  CANDECOMP  and  its  ma¬ 
trix  counterparts  are  thoroughly  different.  Here  we  assume 
that  the  transfer  matrix  is  square  and  regular,  and  that  all  the 
sources  have  non-vanishing  kurtosis.  It  can  be  proved  [9] 
that  these  conditions  are  sufficient  to  guarantee  that  decom¬ 
position  (2)  is  unique  up  to  the  following  trivial  indetermi- 
nacies: 


As  the  name  already  suggests,  the  CANDECOMP  of 
higher-order  tensors  is  a  basic  concept  in  multilinear  alge¬ 
bra.  First  we  define  the  tensorial  outer  product  of  a  set  of 
vectors: 


Definition  1  The  outer  product  of  the  vectors  , 

G  denoted  as  o  o 

. . .  o  is  an  (/i  X  /2  X  ...  X  I^ytensor  A  defined  by 

the  following  element-wise  equation: 


(1)  (2)  (N) 


In  analogy  with  the  vector/matrix  case,  the  outer  product 
leads  to  the  definition  of  rank-1  tensors: 


Definition  2  An  Nth-order  tensor  A  has  rank  1  when  it 
equals  the  outer  product  of  N  vectors  . . . , 

These  elementary  definitions  allow  to  define  the  CANDE¬ 
COMP: 

Definition  3  The  Canonical  Decomposition  (CANDE¬ 
COMP)  of  an  Nth-order  tensor  A  is  the  decomposition  of 
A  in  a  minimal  sum  of  rank-}  components. 

The  decomposition  is  also  known  as  Parallel  Factors  Model 
(PARAFAC).  It  can  be  considered  as  the  tensorial  gen¬ 
eralization  of  the  diagonalization  of  matrices  by  equiva¬ 
lence  transformation  (unsymmetric  case)  or  by  congruence 
transformation  (symmetric  case).  Despite  the  importance 
of  CANDECOMP  no  robust  general  computation  schemes 
have  been  proposed  in  the  past. 


2.2  Link  with  ICA 


When  the  noise  N  is  Gaussian,  it  does  not  contribute  to 
the  fourth-order  cumulant  of  Y .  This  cumulant,  denoted  by 
C,  then  shows  the  following  structure: 

C  =  ^  KpMp  o  MpO  MpO  Mp  (2) 

p 

where  Kp  denotes  the  fourth-order  cumulant  of  the  pth 
source  {I  ^  p  ^  P)  and  Mp  symbolizes  the  pth  “steer¬ 
ing  vector”  (i.e.  the  pth  column  of  M).  Eq.  (2)  is  clearly 
a  symmetric  CANDECOMP-model.  The  contribution  of  a 
non-Gaussian  noise  component,  and  the  effect  of  other  es¬ 
timation  errors  when  C  is  a  finite  sample  cumulant,  is  con¬ 
sidered  as  a  perturbation  of  the  equation. 


•  permutation  of  the  terms 

•  scaling  of  the  steering  vectors  with  a  factor  ap,  com¬ 
bined  with  inverse  scaling  (factor  a“^)  of  the  coeffi¬ 
cients  Kp. 

(The  interested  reader  is  referred  to  the  overview  paper  [4] 
for  a  discussion  of  some  other  uniqueness  properties.)  Note 
that  in  our  setting  different  sources  can  have  the  same  prob¬ 
ability  distribution,  as  long  as  they  are  mutually  statisti¬ 
cally  independent  in  fourth  order.  The  conditions  can  be 
weakened  for  the  identification  of  at  most  one  non-kurtic 
source.  It  is  also  possible  to  handle  the  “more-sensors-than- 
sources”  case. 

3  Simultaneous  Congruence  Transformation 

We  associate  to  C  a  linear  matrix  transformation  in  the 
following  way: 

B  —  C(A)  ^  ^  bij  —  ^  ^  (^) 

kl 

for  all  index  values.  From  Eq.  (2)  follows  that  every  matrix 
in  the  range  space  of  C  can  be  written  as  a  linear  combi¬ 
nation  of  the  “steering  matrices”  MpMj  (1  <  p  ^ 

In  other  words,  the  transfer  matrix  M  diagonalizes  every 
matrix  in  the  range  space  of  C  by  congruence  transfor¬ 
mation.  Assume  a  basis  for  the  range  space  is  given  by 
Ti,  T2, . . . ,  Tp,  then  we  have  the  following  simultaneous 
congruence  transformation: 


Ti 

=  M 

Di 

Ta 

=  M 

Da 

Tp 

=  M 

Dp 

■M^ 

where  Di,  D2, . . . ,  Dp  are  diagonal.  Remarkably,  a  simi¬ 
lar  set  of  equations  arises  in  the  blind  separation  of  constant- 
modulus  signals  [11],  which  suggests  a  weird  link  between 
the  constant-modulus  property  and  non-Gaussianity.  The 
computational  technique  of  this  paper  differs  from  the  one 
presented  in  [11],  the  latter  being  suboptimal. 
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Although  two  equations  in  (4)  are  generally  sufficient  to 
estimate  the  transfer  matrix,  we  prefer  to  solve  the  com¬ 
plete  set  simultaneously,  in  order  to  exploit  all  the  available 
information.  This  can  be  substantiated  by  numerical  argu¬ 
ments  [9]. 

The  simultaneous  solution  of  Eq.  (4)  is  the  higher- 
order-only  equivalent  of  the  simultaneous  eigenvalue  de¬ 
composition  on  which  the  ICA-algorithm  by  Cardoso  and 
Souloumiac  is  based  [2].  In  the  latter  algorithm  pre¬ 
whitening  leads  to  a  simultaneous  matrix  decomposition 
from  which  an  orthogonal  matrix  has  to  be  computed;  in  the 
current  approach  a  general  regular  matrix  has  to  be  deter¬ 
mined  (up  to  the  indeterminacies  mentioned  in  Section  2.3), 
corresponding  to  the  fact  that  there  is  no  pre-whitening. 

4  A  new  matrix  representation 

The  fact  that  the  unknown  transfer  matrix  is  basically 
an  arbitrary  regular  matrix,  makes  it  hard  to  deal  with  in 
a  proper  numerical  way.  Therefore  we  will  represent  the 
mixture  matrix  by  a  pair  of  orthogonal  matrices,  obtained 
from  the  Q-K-factorisation  M  and  the  RQ- 

decomposition  -  R"  Z^.  The  pair  (Q,Z)  is  actually 
an  equivalent  representation  of  the  transfer  matrix,  within 
the  limits  of  identifiability.  From  the  definition  of  Q  and  Z 
we  have: 

(Q  •  Z)  ■  R"^  =  R'  (5) 

The  orthogonal  matrix  Q  •  Z  will  be  denoted  as  V.  The  lower 
triangular  part  of  Eq.  (5)  is  a  system  of  linear  equations  in 
the  unknown  coefficients  of  R": 

[  t'p.p-i  vpp  ]  =  0 

L  ^p-i,p  . 

\  r”  ' 

Vp-\,p-l  Vp-i^p  P--2,P~2 

VP,P~'2  ^P,P-1  Vp,P  P-2, P-1 

L  ^P~2,P 

=  [0  0]^  (6) 

Note  that  a  scaling  of  the  rows  of  R"  does  not  affect  this 
homogeneous  set  of  equations,  which  is  consistent  with  the 
fact  that  the  steering  vectors  can  only  be  determined  up  to  a 
scalar  multiple.  By  substitution  of  R''  in  Eq.  (5)  R'  can  be 
found  as  well. 

5  Simultaneous  Schur  Decomposition 

5.1  Principle 


tions  that  we  will  denote  as  a  simultaneous  Schur  decompo¬ 
sition: 

Q  Ti  Z  =  Ri  =  R'  Di  R" 

Q  T2  Z  =  R2  =  R'  D2  R" 

Q  Tp  Z  =  Rp  =  R'  Dp  R"  (7) 

From  these  equations  the  orthogonal  matrices  Q  and  Z  have 
to  be  determined  such  that  Ri,  R2, . . . ,  Rp  are  “as  upper 
triangular  as  possible”  (in  least-squares  sense).  The  crite¬ 
rion  function  /  to  be  minimized  can  be  written  as: 

/(Q,  Z)  =  IIQ  .  Ti  ■  Z\\Ip  +  . . .  +  liQ  ■  Tp  ■  Z\\Ip  (8) 

in  which  ||A||/,p  denotes  the  below-diagonal  Frobenius- 
norm  of  A,  i.e. 

||A|Up  =  (^^4)'/2  (9) 

3<i  i 

It  can  be  proved  that  the  criterion  /  satisfies  all  the  con¬ 
ditions  for  a  higher-order-only  contrast  function  [3,  5]  that 
discriminates  over  the  set  of  regular  transfer  matrices.  Con- 
trarily  to  classical  approaches  this  contrast  depends  on  two 
orthogonal  matrices. 

5.2  Solution  by  Givens-iteration 

The  core  of  our  method  is  the  computation  of  Q  and  Z 
from  Eq.  (7).  The  criterion  function  /(Q,  Z)  is  optimized 
by  an  iteration  technique,  in  which  Q  and  Z  are  determined 
as  a  sequence  of  elementary  Givens  rotations.  Each  elemen¬ 
tary  rotation  makes  the  set  Ri ,  R2, . , . ,  Rp  simultaneously 
as  upper  triangular  as  possible. 

First,  the  estimates  of  Q  and  Z  are  initialized  as  any 
orthogonal  matrix,  e.g.  as  the  identity  matrix:  = 

Z(o)  -  I  The  estimates  of  Ri,R2,  ...,Rp  are  ini¬ 
tialized  accordingly:  Ri^^^  =  Ti,  R2^^^  =  T2, 
Rp^^)  =  Tp.  In  each  iteration  step  k  either  or 
Z^*^  is  updated.  An  update  of  takes  the  form  of 
Q(/:-f-i)  _  .  Q(*),  in  which  Gy  denotes  an  elemen¬ 

tary  Givens  rotation  that  affects  rows  i  and  j\  at  the  same 
time  Ri^^\  R2^^\  ..,,Rp^*^  are  updated  as  = 

GT.  .  Ri(^),  R2(^+i)  3.  GT.  .  Rp(^+i)  = 

Gy  •  Rp^*\  Z^*)  is  updated  in  a  similar  way,  by  work¬ 
ing  on  the  columns. 

Let  us  focus  on  the  updating  of  by  multiplication 
with  Gy.  The  Givens-rotation  should  be  determined  such 
that  it  minimizes  the  below-diagonal  norm  of  all  the  2th  and 
jth  rows  in  . . .  ,Rp(^+^).  If  we  define 


The  notation  of  the  simultaneous  congruence  transfor¬ 
mation  in  terms  of  Q  and  Z  leads  to  a  set  of  matrix  equa- 


(^P 


(10) 
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using  MATLAB-notation,  then  Gy  should  be  determined 

such  that  the  second  row  of  =  Gy  •  has  mini¬ 

mal  Frobenius-norm.  Hence  Gy  can  be  obtained  as  the  left 

singular  matrix  of  E^^^^  in  this  way  the  norm  of  the  sec¬ 
ond  row  equals  the  smallest  singular  value  of  e[*\  which 
is  the  best  one  can  do.  In  simulations  the  Givens-iteration 
shows  a  monotonous  convergence  to  the  global  optimum  of 
/(Q,z). 

6  Discussion  and  conclusions 

We  presented  a  new  ICA-technique  that  resorts  only 
to  the  higher-order  cumulants  of  the  observations.  The 
transfer  matrix  estimate  that  is  obtained  shows  exactly 
the  same  uniqueness  properties  as  in  the  classical  ICA- 
algorithms  [2,  3],  which  also  exploit  second-order  infor¬ 
mation  in  a  prewhitening  step.  Higher-order-only  Blind 
Source  Separation  has  the  advantage  that  it  is  asymptoti¬ 
cally  insensitive  to  additive  Gaussian  perturbations  of  the 
data.  When  dealing  with  finite  sample  cumulants,  the  ac¬ 
curacy  of  higher-order-only  versus  classical  approaches  is 
subject  to  a  trade-off,  caused  by  the  fact  that  higher-order 
statistics  are  harder  to  estimate  than  second-order  statis¬ 
tics  [1,  10]. 

Our  approach  is  based  on  the  observation  that  the  data 
cumulant  can  be  expanded  as  a  sum  of  rank-1  tensors.  This 
implies  that  all  the  matrices  in  the  range  space  of  the  cu¬ 
mulant  tensor,  considered  as  a  super-symmetric  matrix-to- 
matrix  mapping,  satisfy  a  simultaneous  congruence  trans¬ 
formation.  For  the  numerical  computation  of  this  set  of 
matrix  equations  we  proposed  a  new  representation  of  the 
transfer  matrix:  it  turns  out  that  any  matrix,  of  which  the 
columns  are  fixed  up  to  multiplication  with  a  scalar,  can 
be  represented  by  a  pair  of  orthogonal  matrices,  obtained 
by  QR-  and  jRQ-factorisation.  In  this  new  format  the  si¬ 
multaneous  congruence  transformation  takes  the  form  of  a 
simultaneous  Schur  decomposition,  that  can  be  computed 
by  a  Givens- type  iteration.  The  result  can  be  considered  as 
an  approximate  solution  of  cumulant-based  identifica¬ 
tion  criterions,  e.g.  least-squares  cumulant  matching  can  be 
realized  by  means  of  a  standard  optimization  routine,  using 
the  simultaneous  Schur-solution  as  starting  value  [8]. 

The  technique  established  in  this  paper  is  in  fact 
the  higher-order-only  equivalent  of  the  well-known  com¬ 
bined  second/higher-order  ICA-algorithm  by  Cardoso  and 
Souloumiac  [2].  The  concepts  of  this  paper  also  lead  to  a 
higher-order-only  equivalent  ([9])  of  the  ICA-algorithm  by 
Comon  [3]. 

The  technique  can  also  be  generalized  for  higher-order 
tensors  without  symmetry  properties  [6].  The  unsymmetric 
version  of  the  algorithm  can  be  used  for  Factor  Analysis  of 
multiway  datasets. 
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Abstract 

A  novel  cross- correlation  based  framework  is  pro¬ 
posed  for  ike  problem  of  blind  equalization  in  communi¬ 
cations.  We  assume  that  we  have  access  to  two  obser¬ 
vations,  corresponding  to  the  outputs  of  two  channels 
excited  by  the  same  input.  We  propose  a  new  algo¬ 
rithm  which  estimates  the  channels  using  as  basic  tool 
the  phase  of  the  cross  spectrum  of  functions  of  the  ob¬ 
servations.  The  proposed  method  is  computationally 
attractive,  requires  small  input  sample  sizes,  and  per¬ 
forms  well  in  low  signal-to-noise  ratios. 


1.  Introduction 

Blind  equalization  is  the  problem  of  reconstructing  a 
signal  from  a  filtered  version  of  it,  without  knowledge 
of  the  signal  nor  the  filter.  Research  results  dealing 
with  the  case  of  non-white  signals  have  been  reported 
in  the  past  [1],  [4],  [5],  [2],  [3],  using  either  multiple 
observations  of  the  distorted  signal  [4] ,  or  oversampling 
of  the  received  signal,  [2],  [3].  Both  approaches  lead  to 
a  multichannel  scenario  where  the  input  signal  is  to  be 
estimated  from  multiple  distorted  versions  of  it. 

The  approach  of  [4],  [5]  uses  higher-order  statistics 
of  the  observations,  while  that  of  [2],  [3]  is  based  on 
the  cyclic  autocorrelation  of  the  observations.  The  sec¬ 
ond  approach  has  the  advantage  that  it  can  be  applied 
to  any  type  of  input  signals,  as  opposed  to  the  first 
approach  which  applies  to  non-Gaussian  signals  only. 
The  cyclic  approach  has  a  lower  complexity  compared 
to  the  higher-order  statistics  based  approach,  however 
its  performance  and  the  uniqueness  of  the  solution  is 
critically  related  to  the  knowledge  (or  ability  to  get  ex¬ 
act  estimates)  of  the  lengths  of  the  unknown  channels. 

In  this  paper  we  present  a  new  cross  correlation 
based  approach  which  estimates  the  channels  by  simul¬ 
taneously  minimizing  two  error  criteria  involving  the 
phase  of  a  combination  of  the  two  channels.  This  phase 


is  estimated  from  the  observations  based  on  cross  corre¬ 
lation  operations.  We  show  that  the  proposed  method 
is  not  very  sensitive  to  channel  lengths  mismatch,  re¬ 
quires  small  input  sample  sizes,  and  performs  well  in 
low  SNR’s. 

2.  Problem  Formulation 

The  two  channel  case  will  be  described  next,  how¬ 
ever  the  results  can  be  easily  extended  to  the  multi¬ 
channel  case.  The  unknown  system  model  is  described 
by 

Xi{k)  =  hi{k)*s{k)-i-ni{k),  1  =  1,2  (1) 

Xi(k),  i  =  1,2  denote  the  observations;  hi(k),i  =1,2 
are  the  unknown  FIR  channels;  s(fc)  is  stationary,  gen¬ 
erally  non-white,  zero-mean  random  process;  ni()fc)  are 
noise  processes  uncorrelated  to  each  other  and  to  s(ife). 
It  is  assumed  that  hi(I;)  and  h2{k)  have  no  common  ze¬ 
ros,  that  there  are  no  zero-pole  cancellations  between 
hi{k),i  =1,2  and  convolutional  components  of  s(k), 
and  that  there  are  no  common  zeros  between  convo¬ 
lutional  components  of  s{k)  and  each  of  the  channels. 
Under  these  conditions,  the  channels  hi{k)  and  h2{k) 
are  identifiable  within  a  constant  and  a  delay.  In  the 
sequel  we  present  an  algorithm  that  performs  the  iden- 
tification  task. 

3.  The  Cross-Correlation  Blind  Equaliza¬ 

tion  Algorithm 

Let  us  model  the  random  process  s(A;)  as 

s{k)  =  e{k)  *  h{k)  (2) 

where  e(I;)  is  a  white,  zero-mean  process.  Combining 
(1)  and  (2)  we  get 

Xi(k)  =  e(k)  *  h(k)  *  hi{k)  ni(k)  =  e{k)  *  gi{k)  4- 

(3) 
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The  cross  correlation  of  xi{k)  and  x^ik)  equals 

=  E{xi{n+k)x*2{n)}  =  7I  giik)*g2{-k)  (4) 

where  7^  is  the  variance  of  e(fc).  The  contribution  of 
ni(ik)  and  n2{k)  to  (4)  is  zero,  due  to  the  fact  that 
the  noise  processes  are  zero-mean  and  uncorrelated  to 
e(n). 

The  minimum  phase  equivalent  of  an  all-zero  se¬ 
quence  y{n),  denoted  by  is  a  minimum  phase 

sequence,  whose  zeros  consist  of  the  minimum  phase 
zeros  of  y{n)  and  the  maximum  phase  zeros  of  y{n),  re¬ 
flected  inside  the  unit  circle  at  their  conjugate  recipro¬ 
cal  locations.  One  simple  method  to  estimate 
is  the  power  cepstrum  based  approach, i.e., 

_  F~^{exp[C{u>)]} 

c(n)  =  (5) 

where  C{u))  is  the  Fourier  transform  of  c(n)  and  u{n)  is 
the  unit  step  function.  The  minimum  phase  equivalent 
of  a  random  sequence  x{k)  =  h[k)  *  e(fc)  where  e{k)  is 
white  and  h{k)  deterministic,  is  equal  to  the  minimum 
phase  equivalent  of  the  deterministic  part  h{n). 

Let  d{k)  be  defined  as 

dik)  =  *  gf^ik)  *  {gr^{-k)r  (6) 

The  z- transform  of  a  deterministic,  FIR,  generally 
complex  sequence  can  be  decomposed  as 

H{z)  =  ChZ~''’^  Ih{z~^)Ohiz)  (7) 

where  Ih{z~^'),Oh{z^  are  the  minimum  and  maximum 
phjise  parts  of  h(n)  respectively,  Ch  is  a  constant  and 
r/i  equals  the  number  of  zeros  of  h{n)  outside  the  unit 
circle. 

Taking  the  Fourier  transform  of  d{k)  and  using  (4) 
and  (7),  we  get 

Diz)  =  (z-i)/;,(z*)]2Pi(z),  (8) 


Piiz)  =  0,,iz)Ol{l/z*)OMOl{l/z*)  (9) 


Equation  (10)  is  of  key  importance,  since  the  equal¬ 
ization  scheme  described  in  the  sequel,  is  based  on  it. 
Let  us  consider  the  filtered  observations  yi{k)yi  = 
1, 2  obtained  as: 

yi{k)  =  Xi{k):¥Wi{k)  (11) 

where  Wi{k)y  i  =  1, 2  are  FIR  channels.  Let 

Ema.{z)  =  (12) 

be  the  cross  spectrum  of  the  minimum  and  maximum 
phase  parts  of  the  adaptive  filter  outputs  respectively. 
We  show  that 

arg{Emin{(^)}  =  0  V  u; 

(^)  “  UTld  i/i2  (^)  —  ^iui(^)* 

and  similarly, 

arg{Emax{i^)}  =  0  ^  uj 

Oh^{n)  =  o«,2(n)  and  Oh:,{n)  =  Oyj^{n) 

The  proofs  can  be  found  in  [5] ,  [6] .  Combining  Propo¬ 
sitions  1  and  2,  we  get  that 

arg{Eminii*>)}  =  0  and  arg{Emax{^)}  =  0  V  w 
hi{n)  =  W2{n)  and  h2{n)  =  wi{n). 

(15) 

3.1.  Channel  Identification/Equalization 
Let 

(16) 

and  also  let  us  assume  temporarily  that  the  length  of 
Wminin),  Lmin  is  knowD.  Setting  the  phase  oiEmin{^) 
to  zero  and  after  some  mathematical  manipulations  we 
get 

En=-.^V3,n5^0  “  i>min{^))- 

12nL-N2  =  Sw(^min(w)) 

(17) 

where 


Since  Pi(w)  is  zero-phase,  (8)  leads  to  the  following 
phase  relation 

arg{D{u)}  =  ^arg{Igi{u)I*^{u)}  +  {rg^  -  rg^)u 

=  2arg{Ih^{u)Ihj{i^)}  +  (ni  -  (10) 

where  we  ignored  the  phase  contribution  of  the  terms 
c^j  and  Cg^y  since  this  is  an  additive  constant,  con¬ 
tributing  a  complex  scalar  to  the  corresponding  time 
domain  signal. 


I  TT 

V’m<n(w)  =  arg{Hmin{l^)}, 

(18) 

Ni  is  the  length  of  the  causal  part  of  WminiTi),  and  N2 
the  length  of  its  noncausal  part.  Through  (10),  (18) 
becomes 

V’min(w)  =  ^(arg{D(u)}  -  {n,  -  r/ijw)  (19) 
In  (17)  it  was  taken  t£;^j„(0)  =  1. 
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Repeating  (17)  for  different  a;^s  in  the  set  {cj  = 
^  =  0, . . . ,  L/2},  we  can  form  the  system  of  equa¬ 
tions: 

^min'^min  —  4^min^  (20) 

and  subsequently  solve  it  for  w^m  via  least-squares. 
An  adaptive  solution  can  be  obtained  via  LMS  algo¬ 
rithm. 

Due  to  the  structure  of  Wmin{n),  as  defined  in 
(16),  and  from  Proposition  1,  the  cepstrum  of  Wmin{fi) 
equals 

^mm(^^)  =  2/12  (^)  +  *i^i(”?^)  (21) 

where  ^.(n)  denotes  the  cepstrum  of  the  minimum 
phase  part  of  hi[n).  Since  2/13 (n)  is  a  causal  sequence 
and  (n)  is  a  noncausal  sequence,  (21)  can  be  used  to 
obtain  the  causal  cepstra  of  the  two  channels.  A  sim¬ 
ilar  procedure  can  be  followed  to  yield  the  noncausal 
cepstra  dh^{n)  and  based  on  the  minimization 

of  the  phase  of  Emax{^)‘  Finally,  the  channels  can  be 
reconstructed  via  inverse  cepstra  operations,  within  a 
scalar  and  a  time  delay.  In  the  channel  equalization 
case,  where  the  inverse  channels  are  of  interest,  they 
can  be  reconstructed  as 

ftr(n)  =  F{e^{-*Kn)}},  (22) 

In  many  cases  the  inverse  channel  obtained  that  way 
may  enhance  the  noise  at  the  receiver,  thus  raising  the 
probability  of  error  at  the  decision  device.  This  prob¬ 
lem  can  be  bypassed  by  using  the  so-called  constrained 
Wiener  filter  approach  [7] ,  which  estimates  the  desired 
input  symbols  in  a  least-squares  sense.  This  method 
was  used  in  our  case. 

4.  Implementation  Issues 

There  are  several  issues  in  the  reconstruction  proce¬ 
dure  that  have  to  be  addressed.  As  seen  from  (21), 
Wmtn(n)  is  a  two-sided  sequence  of  unknown  length 
Lmin-  The  length  of  Wmin,  together  with  the  length 
N2  of  its  causal  part,  have  to  be  taken  into  account  in 
forming  (17).  Moreover,  the  phase  V’mjn(w),  as  given 

(19),  contains  a  linear  phase  component  (r/jj  — rfej)w 
and  also  a  constant  phase  c  =  arg{chi]  ~ 
both  of  which  are  unknown.  Neglecting  the  term  c 
will  result  in  a  sequence  Wmin{n)  which  will  differ  by 
the  true  one  by  a  complex  constant.  The  linear  phase 
component  does  not  possess  additional  problems  also, 
since  it  corresponds  to  a  circular  shift  of  the  original 
sequence. 

Let  us  suppose  that  Lmin  is  known.  Assuming  an 
incorrect  value  for  N^,  will  effectively  shift  Wmin{n). 
This  will  appear  as  a  circular  shift  in  the  reconstructed 


sequence  u)min(^)'  Therefore,  if  two  solutions  exist  for 
two  different  values  of  shift,  they  should  differ  by  a 
time  delay.  If  no  solution  exists  for  some  amount  of 
shift,  the  algorithm  will  not  converge.  To  determine 
the  existence  of  a  solution  we  look  for  a  low  value  of 
the  mean  square  error  corresponding  to  the  shortest 
length  Lmin{Lmax^- 

Now  suppose  Lmin  is  unknown.  Assuming  a  value 

>  Lmin  for  the  length  of  Wmini^^)  (1^)) 
will  result  in  a  sequence  iUm,„(n)  which  will  have  the 
same  phase  as  Wmin{n),  but  greater  length.  There¬ 
fore,  will  be  related  to  Wminin)  by  a  zero- 

phase  sequence,  which  has  zeros  in  conjugate  reciprocal 
pairs.  This,  together  with  (21),  imply  that  the  mini¬ 
mum  phase  parts  of  the  reconstructed  channels  h[{n) 
and  ^^(n)  will  have  common  zeros.  The  number  of 
common  zeros  is  the  difference  between  L  and  Lmin, 
and  can  be  found  as  the  number  of  zero  eigenvalues 
of  the  Sylvester  matrix  formed  based  on  (n)  and 
*/i2(”)>  minimum-phase  equivalents  of  /ii(n)  and 
h2{n)  respectively. 

5.  Simulations 

A  channel  was  generated  from  two  delayed  raised 
cosine  pulses,  as  an  approximation  to  a  two-ray  multi- 
path  environment.  The  channel  is  given  by 

h{t)  =  0.2c(f,  0.11)  -I-  0.4c(f  -  2.5, 0.11),  (23) 

where  c(t,  a)  denotes  a  raised-cosine  pulse  and  a  is  the 
roll-off  factor.  The  length  of  the  channel  was  taken  to 
be  6  symbols.  Two  virtual  channels  hi{n)  and  h2{n) 
were  generated  by  oversampling  h{t)  by  a  factor  of  two. 
The  source  symbols  were  drawn  from  a  16  QAM  signal 
constellation  with  uniform  distribution.  The  noise  pro¬ 
cesses  were  white,  zero-mean  and  Gaussian  distributed. 

In  the  implementation  of  the  algorithm  we  used  100 
input  symbols,  solving  (20)  in  a  least-squares  sense. 
The  average  least-squares  errors  over  100  simulations 
were  used  to  estimate  the  lengths  Lmin,Lmax  and  the 
parameter  N2-  The  first  substantial  drops  of  the  errors 
ocurred  at  the  correct  values,  however  a  length  mis¬ 
match  was  found  not  to  cause  errors  to  the  estimation 
procedure.  Fig.  1  shows  the  actual  channels  and  the 
sample  mean  of  100  estimates  of  the  channels  for  SNR 
10  dB.  Fig.  2  shows  a  plot  of  1000  output  symbols  of 
the  unequalized  channel  hi(n)  corrupted  by  additive 
noise  at  SNR  20  dB.  The  channels  were  then  recon¬ 
structed  and  equalized.  1000  symbols  were  then  trans¬ 
mitted  and  the  equalized  channel  output  is  shown  in 
Fig.  3,  which  indicates  that  the  eye  is  well  opened. 
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Figure  1.  True  (solid  line)  and  recon¬ 
structed  channels  at  SNR=10  dB.  Dashed 
line  indicates  sample  mean  of  100  Monte 
Carlo  runs  of  the  reconstructed  channels. 
Dotted  lines  indicate  standard  deviation. 
100  output  symbols  were  used  in  the  esti¬ 
mation  procedure. 


Figure  2.  The  output  of  the  unequalized 
channel  fti(n),  for  SNR=20  dB. 


Figure  3.  The  output  of  the  equalized  chan¬ 
nel  hi{n).  1000  symbols  were  plotted.  100 
output  symbols  were  used  in  the  estima¬ 
tion,  at  SNR=20  dB. 
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Abstract 

We  consider  additive  nonlinear  Autoregressive  Exoge¬ 
nous  (ARX)  time  series  and  we  propose  projections  as  means 
of  identifying  and  estimating  its  endogenous  and  exogenous 
components.  The  estimates  are  nonparametric  in  nature  and 
involve  averaging  of  kernel  type  estimates.  Such  estimates 
have  very  recently  been  treated  informally  in  a  univariate 
time  series  situation.  Here  we  extend  the  scope  to  nonlinear 
ARX  models  and  present  a  rigorous  theory,  including  the 
establishment  of  consistency  and  asymptotic  normality  for 
the  projection  estimates. 


1.  Introduction 

Nonlinear  time  soies  analysis  has  received  much  atten¬ 
tion  in  recent  years  due  primarily  to  the  fact  that  linear  mod¬ 
els,  such  as  ARMA,  fail  to  capture  many  nonlinear  features 
present  in  commonly  encountered  time  series,  as  in  econo¬ 
metric  data.  See  Tjdstheim  [10]  for  a  recent  review.  Both 
parametric  (Lewis  and  Stevens  [4]  and  Grange  and  Tbrasvi- 
tra  [3])  and  nonparametric  models  (Chen  and  "Ray  [2])  were 
introduced  in  the  literature.  While  a  rigorous  theory  is  avail¬ 
able  in  the  parametric  case  (Potsch»  and  Prucha  (1991)), 
much  less  has  so  far  been  achieved  for  nonparametric  mod¬ 
els;  the  woilc  of  Chen  and  "Ray  is  algorithmic/computational 
in  nature  lacking  analytical  conveigence  results. 

In  this  paper  we  consider  nonlinear  bivariate  autoregres¬ 
sive  exogenous  (ARX)  time  series  modeled  by: 

^<+«+i  = 

1  •  •  • ,  ^+j)  +  1  .  .  • )  -^<+p)  +  ^t+t+l  (1) 

^t+p  =  •  •  .,Xt+p-i)  +  £t+p- 

where  p  <  ?  +  1  and  {c*}  and  {e<}  are  independent  series 
each  consisting  of  zero-mean  independent  identically  dis¬ 
tributed  variables  (iid)  with  variance  and  of ,  respectively. 


Undo^  certain  regularity  conditions,  the  bivariate  process  (1) 
is  jointly  stationary.  The  variables  {X*}  and  {Yi}  are  ex¬ 
ogenous  and  endogenous  respectively.  The  nonlinear  ARX 
model  is  of  fundamental  importance  in  modelling  econo¬ 
metric  time  series.  A  popular  subclass  is  when  both  gi  and 
02  are  themselves  additive  so  that,  for  example, 

1 

gi{xi,...,Xg)  =  y^gi.<(j.) 

<=i 

(see  for  example  Chen  and  ‘Ray  [2]).  We  do  not  limit  our¬ 
selves  to  this  special  case.  Our  goal  is  to  identify/estimate 
the  functional  structure  of  the  time  series  from  the  observa¬ 
tions  {Y< ,  We  note  that 

E[Xt+p\Xt+i  =  ®i, . . . ,  X<+p_i  =  Xp-i]  = 

g3{xi,...,xp-i)  (2) 

and 

m{xi,...,xp\yi,...,y^)i: 

•®[^«+j+i  ,  •  •  • ,  Xt+p  =  Xp] 

Yt+i  =  ift , . . . ,  Yt+q  =  pj] 

=  +  (3) 

The  function  ps  can  be  estimated  in  a  straightforward  manno' 
by  kernel  methods  and  was  treated  in  Masry  and  Tjdstheim 
[S].  From  (3)  it  is  seen  that  regression  estimation  methods 
could  consistently  estimate  only  the  sum  of  the  functions 
Pi  and  02.  Our  approach  to  additive  modeling  in  genoal 
and  to  the  additive  ARX  model  (1)  in  particular  is  through 
projections.  Projections  were  introduced  by  Auestad  and 
Tjdstheim  in  [1][11]  for  a  univariate  additive  model  with 
the  purpose  of  identifying  the  functional  structure  of  the 
components.  In  this  pap^  we  establish  the  consistency  and 
asymptotic  normality  of  projection  estimates  for  the  more 
general  ARX  framework.  In  this  way  we  extend  rigorous 


0-8186-7576-4/96  $5.00  ©  1996  IEEE 


368 


analysis  of  estimates  in  additive  models  firom  the  indepen¬ 
dent  component  case  of  Stone  [9]  to  the  present  ARX  case, 
where  dependence  is  an  integral  part  of  the  system. 

We  remark  that  projection  estimation  theory  draws  on 
traditional  konel  regression  results  (cf.  Robinson  [8]) 
and  on  corresponding  results  for  the  ARCH  (autoregres¬ 
sive  conditionally  hetroskedastic)  model  treated  in  Masry 
and  Tjdstheim  [S]. 

It  is  possible  to  generalize  the  system  (1)  to  a  full 
nonlinear  ARX-ARCH  system  by  multiplying  e<+,+i 
and  Ci+p  by  nonlinear  functions  ff4(Vi+i ,  •  •  • ,  ^<+j) 
gs{Xt+i Xt+p-i ),  respectively.  A  univariate  ARCH 
system  was  treat^  at  length  in  Masry  and  Tjdstheim  [5]. 
By  combining  the  results  of  that  paper  with  the  theory  of 
the  present  one  it  is  possible  to  construct  an  ARX-ARCH 
estimation  theory. 

2.  Projection  Estimates 

Projection  estimates  are  defined  as  follows.  Let 

y,  =  (Yi+i , . . . ,  y<+,)  and  Xt  =  (-^<+1 .  •  •  •  ■  ^t+p)- 

Tbe  vectors  x  and  y  are  defined  analogously,  and  a  more 
general  version  of  (3)  can  be  written  as 

m(x,  y;p,q)  =  E{4>{Yt+q+i )  |  Zj  =  !/}  (4) 

whCTe^ismeasurableonthereallineandi?{|^(y<)|}  <  oo. 
The  introduction  of  <f>  allows  us  to  estimate  conditional  mo¬ 
ments,  ^(y)  =  y’’,  and  conditional  distributions,  ^(y)  = 
J{y  <  u}.  Set 

h(x,  y ;  p,  q)  =  m(x,  y ;  p,  q)f{x,  yip,q)  (5) 

where  f{x,  y:p,q)  is  the  joint  density  of  (2C« .  Zt).  assumed 
to  exist  Let  b„  be  the  bandwidth  parameter  and  set 

K„iu)  =  6-(p+«)a:(«/6„) 


For  the  ARX  model  ( 1)  we  estimate  the  sum  of  the  functions 
gi  and  gz  as  follows: 

{ill  (k)  +  fl'zU)}™  =  {x,  y\p,q)  (9) 

where  m„  is  given  in  (8)  with  ^(ar)  =  x.  We  thwi  employ 
the  projection  technique  to  estimate  g\  (j/)  essentially  up  to 
scalp,  and  location.  Let  5i  be  a  compact  subset  of  Iff  and 
Sa  be  a  compact  subset  of  ii*  and  let  D  =  5i  x  Sz.  Put 


«^(*,y)  =  |J 

for  (£,y)  €  D 
for  (x,^^D 

(10) 

Then  define  the  projection 

Py(y)  =  E[m(Xt 

,y',p,  q)v’(2Lt>y)y 

(11) 

and,  for  the  ARX  system  (1),  we  then  have 

iV(y)  =  isi(y)[i/i(y)^’’{2Co  ^  S'i}+^'{y2(2Co)isi(2Co)}] 

Thus  the  projection  identifies  yi(y)  for  y  €  Si  up  to  a 
multiplicative  and  additive  constant.  Moreover,  it  is  seen 
that  the  multiplicative  constant  will  ^proach  1  when  the 
support  Dofwis  taken  to  be  large  enough. 

In  view  of  (I  I),  we  estimate  Py  (y)  by 

1 

/V(y)  —  — 7  ^ y')^^{2Lti  P)  9)  (^2) 

where  na  =  n  -  p  -  1  and  y  ;p,  g)  is  given  by  (8). 
One  can  similarly  estimate  the  function  gz,  related  to  the 
projection, 

Px{x)  =  l5.(£)[ff2(£)Pr{Zo  €  5i}+i?{ffi(Zo)lft(Zo)}] 
by 


where  K{u)  is  a  kernel  function.  Given  the  observations 
{Xi,  y  we  estimate  /  by 


fnix,y,p,q)  =  —^'^K„{x-2Lt>y-Y4)  (6) 


and 


hn{x,y;p,q)  =  — ^5Z^(y*+,+i)X„(i-X«.y-Z4) 

(7) 

where  ni  =  n-  g-  2is  assumed  to  be  positive.  We  now 
estimate  m{x,  y\  p,  q)  by 


fhn{x,y',p,  q)  = 


hn{x,y;p,q) 
fnix,y;p,  q) 


(8) 


I 

Px(x)  =  — T~7 ^(x,Y^)Tnn(x,Y^  tP>q)  (13) 
”3  +  1^ 

wh^  ns  =  n  -  q  —  1. 


3.  Results 

We  present  in  this  section  the  main  results  of  the  p^ier 
without  proofs.  The  full  proofs,  along  with  the  precise 
regularity  conditions  needed,  can  be  found  in  Masry  and 
'Qdstheim  [6].  We  first  show  that  the  projection  estimate 
iV(y)  is  consistent  We  have 

Theorem  1.  Assume  that  the  functions  /  and  h  are  Lipj 
for  some  0  <  7  <  1  and  that  the  bandwidth  parametCT 
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bn  satisfies  nt*  -+  oo  as  n  — ♦  oo.  Then,  under  certain 
regularity  conditions  we  have 

-  Prik)  =  +  0,ibl). 

The  first  term  on  the  right-side  is  the  contribution  of  the 
variance  and  the  second  term  is  the  contribution  of  the  bias. 
Note  that  the  above  rates  for  the  projection  estimates  coin¬ 
cide  with  the  classical  regression  rates. 

Now  define 

H{x,y)  =  w{x,  y)f{x;  p)//(x,  y ;  p,  q) 
and 

V^iu,v;p,  q)  =  \ar{<l>(Yt+q+i  |  =  «,  y<  =  v}  . 

Also  define 

a^(w.y)  = 

p;  P,  q)  +  {m(u,  y ;  p,  q)  -  m(«,  y ;  p,  g)}2j 

X y)f(li, y ;  p,  q)du.  (14) 

Assume  that  the  kernel  K{u)  on  is  factorable:  With 
Ji  =  y)*  We  then  have  the 

following  residt  on  the  asymptotic  normality  of  the  projec¬ 
tion  estimate  iV(y). 

Theorem  2.  Assume  that  the  bandwidth  parameter  bn 
satisfies  nb^  — ►  oo  as  n  — »•  oo.  Then,  under  certain  regular¬ 
ity  conditions  we  have 

inbiffHMy)-Py{y)-Bn(x,y)} 

^^f{0,a\y,^\\K^^\^  (15) 

at  continuity  points  of  a^(r ,  j/)  as  a  function  of  u. 

Remark.  The  term  S„  (x,  j/)  in  Theorem  2  represents  the 
"bias"  of  the  projection  estimates.  When  the  functions  /  and 
h  are  Lipy,  as  in  Theorem  1,  then  S„(®,  y)  =  Op(b2).  It 
is  seen  from  Theorem  2  that  the  projection  estimate  Py(y) 
is  asymptotically  normal  and  a  precise  expression  for  the 
asymptotic  variance  is  given  by  a^(y, 

4.  Example 

We  carried  out  a  small  simulation  experiment  for  the  first 
order  system 

Yt+i  =  0.5y,  +  +  c,+i  (16) 

Xt+i  =  0.5X<  +  ei+i  (17) 


where  {cj}  and  {e*}  are  generated  as  independent  processes 
consisting  of  Gaussian  iid  random  variables  with  zero  mean 
and  variance.  The  {Xt}  and  {Yt}  processes  woe  subse¬ 
quently  adjusted  so  that  they  have  zero  mean  (already  the 
case  for  the  {A’t}-process)  and  unit  variance.  The  projec¬ 
tion  estimates  (12)  and  (13)  were  computed  for  the  scded 
processes  taking  the  set  [-3,3]  x  [-3,3]  as  the  compact 
set  D  and  using  a  bandwidth  6„  =  The  results 

clearly  reveal  the  linear  dependence  on  It  and  the  quadratic 
a:-dependence  on  Xt+i  in  (16). 
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ABSTRACT 

Blind  equalization  of  general  Volterra  models  has  not  been 
addressed,  despite  its  practical  value  in  communications, 
acoustics,  and  physiological  modeling.  Relying  upon  cb- 
versity  (sufficient  number  of  multiple  outputs),  we  estab¬ 
lish  existence  and  uniqueness  conditions  which  guarantee 
that  single-input,  FIR  nonlinear  Volterra  channels  can  be 
perfectly  but  blindly  equalized  using  linear  FIR  equalizers. 
Apart  from  a  minimal  order  persistence-of-excitation  condi¬ 
tion  (also  present  with  input-output  approaches),  the  inac¬ 
cessible  input  is  allowed  to  be  deterministic  or  random  and 
of  unknown  color  or  distribution.  With  the  kernels  also  sat¬ 
isfying  a  certain  co-primeness  condition,  we  develop  direct 
blind  equalizers  which  by-pass  the  channel  estiination  step. 
Preliminary  simulations  corroborate  our  analytical  results. 


1.  INTRODUCTION 

Identification  of  nonlinear  systems  is  of  paramount  impor¬ 
tance  in  acoustics,  physiological  modeling  [5],  magnetic 
recording  [1],  satellite  and  microwave  communication  links 
[4].  Using  input-output  data,  methods  for  identifying  FIR 
Volterra  models  have  been  proposed  (see  e.g.,  [9]).  But 
apart  from  special  Cctses,  dealing  with  memoryless  nonlin¬ 
earities  and  imposing  extra  conditions  on  the  input  [8]j  [^Jj 
the  blind  scenario  has  not  been  addressed.  Its  practical 
significance  is  evident  with  high-speed  (over  5  kb/s)  com¬ 
munication  channels,  especially  when  no  training  inputs  are 
available  or  when  new  receivers  are  added  in  the  link  and 
transmission  can  not  be  interrupted  to  initiate  a  new  train¬ 
ing  session. 

In  this  paper  we  address  the  blind  equalization  and  iden¬ 
tification  of  FIR  nonlinear  Volterra  channels  by  exploiting 
the  temporal  and/or  spatial  diversity  offered  in  the  form  of 
multichannel  output  time  series.  The  latter  are  collected  by 
oversampling  the  continuous- time  data  at  a  rate  faster  than 
the  symbol  rate  and/or  by  multiple  antennas.  Diversity  is 
also  exploited  in  [10],  [6],  [11],  [2]  for  bUnd  identification  and 
equalization  of  linear  time-invariant  FIR  channels,  and  the 
present  work  extends  these  ideas  to  the  challenging  set-up 
of  nonlinear  Volterra  models. 

We  present  our  results  in  the  linear-quadratic  case  (proofs 
and  generalizations  to  nonlinearities  of  arbitrary  order  are 
reported  in  [3]).  To  link  temporal  with  spatial  diver¬ 
sity  we  start  with  the  (baseband)  continuous-time  Volterra 
model  Xc{i)  =  s{li)s{l2)h2c{t- 

-  /2T),  where  T  is  the  symbol  period.  As 
with  the  linear  case,  oversampling  by  a  rate  of  M/T 
yields  x(n)  :=  Xc(t)|t=nT/M  = 

s{h)sil2)h2{n  -  /iM,n  -  hM),  where  hi{n)  := 

hic{nT/M)y  and  similarly  for  /i2(ti1)  ^^2).  Time  series  x(n) 
is  cyclostationary  with  period  M.  But  upon  defining  the 


sub-processes  a:^”^^(7i)  x(nM H-m—l),  m  —  1, . . . ,  Af ,  the 
M-channel  process  x\n)  :=  [rc^^^(n)  . . .  x(^)(nl],  becomes 
stationary,  and  for  n  =  0, 1, . . . ,  A  -  1  is  given  by: 

*'(")  =  ^hi(0«("-0+v(n) 

z=o 

L2  h 

+  '^^h.2Hi,h)s{n-h)s{n-l2),  (1) 

Zl=0  Z2  =  0 

where;  (i)prime  denotes  transpose  and  lower  (upper)  bold 
is  used  for  vectors  (matrices);  (ii)M  x  1  vector  hi(h2)  cor¬ 
responding  to  the  linear  (quadratic)  kernel  is  defined  sim¬ 
ilar  to  x;  (iii)  the  inaccessible  scalar  input  s(n)  is  allowed 
to  be  either  deterministic  or  a  realization  of  a  random 
process  (white  or  colored);  (iv)the  range  of  h  is  chosen 
so  that  h2(/i,/2)  is  defined  over  its  non-redundant  region 
0  <  ^2  <  ^i;  (v)v(n)  is  AWGN  (see  also  Fig.  1). 


Figure  1.  SIMO  linear-quadratic  model 

Given  {x(n)}^J'o^  obeying  (1),  we  first  transform  it  to  a 
specific  multi-input  multi-output  (MIMO)  linear  model,  de¬ 
termine  orders  Li ,  L2 ,  and  establish  conditions  for  FIR  vec¬ 
tor  equalizers  to  exist  (Section  2).  Blind  linear  FIR  equal¬ 
izers  {gi,i(fc)}f=o  of  order  K  satisfying  (in  the  absence  of 
noise)  the  zero-forcing  condition 

K 

x\n  -  k)  gi,i{k)  =  s(n-*),  (2) 

are  derived  in  Section  3  and  simulated  (with  noise  present) 
in  Section  4.  Their  uniqueness  is  established  within  a  shift 
i  €  [0,  Li  d-  K]  which  is  non-identifiable  in  the  blind  case. 

2.  MULTICHANNEL  APPROACH 

With  new  variables  i  =  h  hi  I  ^  hi  ^  view  the  2-d 
kernel  h2(/i,  I2)  as  a  collection  of  X2  +  1  linear  (1-d)  kernels 
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defined  as:  h2i(/)  :=  h2(/  +  t,  1),  i  =  0, 1, . . . ,  I2  ajid  /  = 
0, 1, . . . ,  i2  —  i.  Such  an  interpretation  reduces  (1)  to: 


**(«)  =  ^  {l)s{n  -  /)  +  v(n) 


i=0 


+  E 


L2 

E 

1=0  L  i=o 


y~l  h2,  (/)s(n  -  :  -  /)s(n  -  /) 


.  (3) 


and  casts  the  SIMO  problem  into  a  MIMO  one,  but  with 
special  X2+2  inputs  si(n)  =  sin),  S2o{n)  =  s^{n),  S2i(n)  = 
”  l)5(n),  . . . ,  S2L2('n)  =  s(n  —  L2)s{n), 

Let  us  now  define  the  (Li  +  iiT  +  1)  x  M{K  +  1)  block 
Toeplitz  matrix  associated  with  hj(/) 


channels  required  is  thus  =  X2  +  3  which  depends  on 

the  rnemory  of  the  nonlinearity;  e.g.,  memoryless  nonlinear¬ 
ities  (often  encountered  with  satellite  links)  correspond  to 
i  =  0  in  (3)  and  require  Mmin  =  3  channels  and  a  minimum 
equalizer  order  Kmin  =  Li  +  X2  —  1  (recall  that  for  linear 
channels  Mmin  =  2  and  Kmin  =  ii  -  1)  [10],  [11]; 
(a4)matrix  H  has  full  row  rank;  i.e.,  rank(H)  = 
d[Li,  Z2,  K),  which  implies  that  there  are  no  common  zeros 
among  the  1-d  kernel  transfer  functions  H^^\z), 

•  •  •  >  -^2^2  (^)}  a-cross  all  M  channels. 

We  stress  that  apart  from  p.e.  (also  needed  for  input- 
output  methods)  no  extra  assumptions  are  imposed  on  s(n). 
Matrix  H  must  be  at  least  fat  (square  if  equality  holds  in 
(6)),  which  along  with  (a^  expresses  the  need  for  diversity 
(sufficient  number  of  ''sufficiently  different”  channels). 


Hi  := 


M(o) 

M(La) 

0' 


o' 

..  h[{Li^K) 
hi(Zi) 


Similarly,  for  each  i  G  [0,L2],  we  denote  the  {L2 K  + 
1  —  i)  X  M{K  1)  matrix  corresponding  to  h2,  (/)  as  H2i. 
We  dso  define  the  1  x  (Li  +  iiT  -j- 1)  input  vector  sJo(n)  := 
[s(n);  s(n  ^  1); . . . ;  s(n  —  iT  —  Xi )]  and  respectively  for  each 
*  ^  [Oj  -^2]  the  1  X  {£2  K  -\-l  —  i)  vector  S2i(n)  :=  [s(n  — 
y  *  ’  ’  ’  —  L2)s(n  “  n  —  L2  H- 1)].  With  these 

definitions,  we  obtain  the  matrix  version  of  (3),  X  =  SH, 
where  the  (iV  —  K)  x  MiK  -{- 1)  block  Hank^  data  matrix 
X  is  formed  as 


X  := 


x'(W  -  1) 

xiK) 


x'(W  ^1-K)  ' 

x'(0) 


(4) 


J?r  ^  d{Li,L2,K)  input  matrix  S  and 

L2,  K)  X  M{K  -j-1)  system  matrix  H  are  given  by 


■slo(iv-i)  ...  s^i^(jv-i-ir) - 

r  Hi  1 

slo(iV-2)  ...  s'2l^(N-2-K) 

J 

H20 

Sio(ii')  ...  S2x,2(0) 

_  H2L2  - 

The  common  dimension  of  S  and  H  is  d(Li,  L2,  iif)  =  Xi  + 

^  ^  +  ]CfJo(^2  +  ii"  +  1  -  0>  or, 


d{Li,  L2,  K)  —  (Z2  +  2)(Jir  -|- 1)  -|-  (L2  +  1/2+  2Li)/2  .  (5) 

We  adopt  the  following  assumptions: 

(al)i\r  —  K  >  max(Li ,  Zr2)  +  +  1,  which  is  easily  met  in 

practice  by  collecting  sufficient  data; 

(a2)input  s(n^  is  persistently  exciting  (p.e.)  of  order  ps  = 
d{Li,L2,K);  i.e.,  rank(S)  =  d{Li,L2yK),  White  noise  is 
p.e.  of  any  order,  but  a(Xi,X2,ir)  modes  in  the  spectrum 
of  s(n)  may  not  guarantee  p.e.  as  in  the  linear  case;  s{n) 
must  also  have  sufficient  amplitude  levels  (note  that  if  e.g., 
5(71)  =  0, 1,  S  is  rank  deficient  because  s(w)  =  s^(n)); 

(a3) quadruplet  {M,K,Li,L2)  obeys 

M(K^l)>d{LuL2,K),  (6) 

which  for  a  given  M  and  (Li,L2)  is  satisfied  by  choosing 

^  ^  1(^2  +  L2  +  2I-i)/2(M  —  X2  —  2)]  1,  where  [a]  de¬ 

notes  the  smallest  integer  >  |a|.  The  minimum  number  of 


2.1,  Order  determination 

Assume  noise-free  data  and  equality  in  (6)  to  obtain,  [3]: 
Lemma  1:  Under  (al)-(a4),  matrix  X  m  (4)  has 
rank(X)  =  d{Li ,  L2 ,  A") .  □ 

Let  (Li,L2)  be  known  upper  bounds  on  {Li,L2).  With  M 
given,  choose  two  distinct  orders  (^1,^2)  both  >  Kmin  := 
r(A2_+  1^2  +  2Li)/2(Af  ~  L2  —  2)]  —  1,  and  form  matri¬ 
ces  Xi,X2  as  in  (4).  It  follows  from  Lemma  1  that  pi  := 
rank(Xi)  =  d{Li,L2jKi),  t  =  1,2.  Knowing  (A'i,A'2), 
evaluating  pi,p2  (using  SVD),  and  using  (5),  we  establish: 
Corrolary  1:  For  the  model  in  (1)^  the  orders  Li,L2 
can  be  found  as  L2^  =  (pi  —  p2)/(iifi  —  K2)  —  2,  and 
Li  =  pi  —  {L2  +  2)(^i  +  1)  —  (L2  +  ^2)72.  □ 

From  now  on  we  assume  Li,L2  known  and  choose  K  to 
satisfy  (6)  for  a  given  Af. 

2.2,  Existence  and  uniqueness 

Consider  (2)  with  n  =  N  -  I, , . , ,  N  -  K,  and  define  gj,,  := 
[gi,t(0)  *  •  •  gi,,(iir)]  to  obtain  the  matrix-forms 

Xgi,i  =  Si  ^  SHgi.i  =  Si,  (7) 

where  s,  denotes  the  (i  +  l)st  column  of  S.  But  (7)  holds 
iff:  Hgi,i  =  e,  where  e,  is  a  d(Li,L2yK)  x  1  vector  with 
unity  in  its  (z  +  l)st  entry  and  zero  elsewhere.  Given  H  and 
a  fixed  shift  i  G  [0,  Li  +  FT],  we  prove  that  [3]: 

Theorem  1:  Under  (a3)  and  (a4)j  a  linear  FIR  equalizer 
gi  i  z=  H^ei,  i  G  [0,  Xi  d-  K],  exists.  It  is  unique  if  (6)  holds 
as  equality,  or,  if  the  pseudoinverse  is  adopted  when  (6) 
holds  as  strict  inequality  (minimum-norm  solution).  □ 

The  vital  role  of  diversity  offered  by  multichannel  data  is 
transparent  if  we  note  that  (6)  is  not  satisfied  with  Af  =  1; 
i.e.,  single  channel  linear  FIR  equalizers  of  FIR  Volterra 
channels  are  impossible.  With  Mmin  =  L2  +  3,  (6)  is  satis¬ 
fied  as  an  equality  if  we  choose  Kmin  =  (L2+L2+2L1  “2)/2. 
On  the  other  hand,  norm  preferable  for  sup¬ 

pressing  AWN  as  in  the  linear  channel  case  [2], 

Although  the  number  of  antennas  (and  thus  complexity) 
increases  in  the  nonlinear  case,  our  ability  to  equalize  non¬ 
linear  channels  with  linear  FIR  equalizers  is  very  appeal¬ 
ing  especially  because  stability  of  inverse  Volterra  systems 
is  difficult  to  define  and  even  more  difficult  to  check. 

3.  DIRECT  BLIND  EQUALIZERS 

To  solve  (7)  in  the  blind  case  we  must  eliminate  the  input 
dependence.  Consider  (2)  with  *  =  0  to  find  x'(n  - 

^)gi,o(A:)  =  s(n),  and  also  substitute  n  +  i  n  in  (2)  to 
obtain  x'(7i  +  t  —  A:)gi,,'(/;)  =  s(n).  Eliminating  s(ti) 
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from  these  equations,  we  arrive  at  the  cross-relation: 

K  K 

^x'(n  -  fc)  gi,o(fc)  =  y~^x'(n  +  t-fc)  gi,<(fc)  , 

Jb=0  k=0 

which  forms  the  basis  of  our  blind  approach.  In  matrix  form 
we  use  Matlab’s  notation  X(ti  :  *2)0  denote  a  submatrix 
of  X  formed  by  the  *i  through  *2  rows  and  all  columns  of 
X.  For  the  St  vector  in  (7),  it  holds  that  So(t  +  1  :  N  K^: 
)  =  s,(l  :  N  —  K  —  ij:)j  which  implies  that  the  left-hand 
sides  of  (7)  for  different  i'$  are  related.  Upon  defining 

Xi:=^X{i:N -K,:)  ,  Xo,i  =  X(1  :  iV  -  iiT  -  , 

we  infer  that  Xjgi,o  =  Xo,fgi,»;  i.e.,  for  *  =  1, . . . ,  +  Kj 


^0,tgl,0i  •=  [Xo,t  Xi] 


gi,* 


=  0 


(8) 


If  the  lineal  kernel  is  absent  (homogeneous  model)  we 
again  find  i/{Xo:Li^k)  =  1,  and  the ^20  equalizer  is  uniquely 
identifiable  although  of  limited  value  since  its  output  s^(n) 
is  only  sufficient  when  sign  ambiguity  is  not  a  problem  (e.g., 
transmission  with  non-negative  signals). 

Our  conclusions  on  the  nullity  of  ^o:Li+jc  carry  over  to 
nonlinearities  of  arbitrary  order  although  some  cases  are 
easier  than  others  (e.g.,  odd  order  nonlinearities  only)  [3]. 
Equalizing  nonlinear  channels  with  linear  deconvolvers  is 
neat  and  can  be  justified  intuitively  if  one  views  the  vector 
equalizer  as  a  beamformer  which,  thanks  to  its  diversity, 
is  capable  of  nulling  the  nonlinearities  and  equalizing  the 
linear  part.  With  equalizers  corresponding  to  ^  possible 
shifts,  we  can  align  their  outputs  and  average  in  order  to 
estimate  the  input  via  (c.f.(2)): 

t=0 


r  ^ 

_k=0 


The  pair  of  equalizers  (gi,0)gi,*)  wiD  be  identifiable  ^up  to 
a  scale)  as  the  eigenvector  corresponding  to  the  minimum 
eigenvalue  of  Xo,i  iff  the  nullity  =  1-  It  is  also 

possible  to  collect  all  pairs  of  shifts  (0,  l),  (0, 2), . . . ,  (0,  Li  + 
K)  and  recover  simultaneously  equalizers  corresponding  to 
all  shifts  by  solving  =  0;  i-^.) 


■  Xi 

1 

■ 

0 

0 

gl,0 

Xij  +  K 

6 

.  gl.ii  +  K  . 

(9) 

where  A'o:Li+ic  has  dimensions 

K  +  1)M{K  -h  lb  With  regards  to  the  ranks  of  ATq.Li+k 
and  Ao:Li+k  in  (8)  and  (9)  we  prove  [3]: 

Theorem  2:  Suppose  comes  from  (1)  with  v(n)  =  0, 
(al),  (a3),  (a4)  are  satisfied,  and  (6)  holds  as  equality, 
if  Li  >  L2  and  p.e.  order  ps  >  2d{Li,  L2iK)  +  1,  then 
i/(A^o,Li+Jc)  =  1  the  minimum  and  maximum  delay 
equalizers  are  identifiable  from  (8).  If  Li  '>  L2  ond  (o^) 
holds,  then  i/(-To:Li+x)  =  1  and  all  equalizers  in  are 
identifiable  from  (9).  □ 

Note  that  (8)  involves  a  smaller  matrix  than  (9),  but  also  re¬ 
quires  stronger  p.e.  conditions.  Among  all  (0,  t)  pairs  only 
the  (0,  Xi  -h  K)  pair  of  equalizers  can  be  identified  alone. 
Two  questions  arise  at  this  point:  when  does  L\  >  L2  hold 
in  practice?  and  what  if  =  i/2?  Condition  L\  >  L2 
requires  memory  domination  of  the  linear  part  which  is  ex¬ 
pected  in  most  practical  cases.  Also,  in  magnetic  record¬ 
ing  applications  we  have  Li  L2  ^  L  —2  or  3,  but 
s(n)  =  0, 1;  hence,  s(n)  =  s^(n)  which  allows  us  to  combine 
the  quadratic  kernel  h2o(0  order  L)  with  the  linear  one 
hi(/),  leaving  the  remaining  kernels  h2i(/)  with  orders  L  —  i, 
*  €  [1,  X].  In  this  case  too,  Theorem  2  applies  because  there 
is  a  single  kernel  attaining  the  maximum  order  L. 

If  Li  =  ^2)  then  it  turns  out  that  i'(A'o:Li+jc)  =  2,  in 
which  case  super  vector  g^  in  (9)  is  a  linear  combination  of 
the  null  eigenvectors  Ui,U2:  g^  =  AiUi  -f  A2U2.  To  deter¬ 
mine  the  Ai ,  A2  constants  we  talce  advantage  of  “quadratic” 
equalizers  such  as  g20,i  whose  output  satisfies: 

K 

k)  g20,i{k)  =  s'^{n -i)  .  (10) 

k=0 

It  turns  out  that  similar  to  ,  supervector  also  satis¬ 
fies  (9)  and  hence,  g^^  =  piUi  -|-/i2U2.  Eliminating  s(n-t) 
from  (2)  and  (10),  a  cross- relation  between  g^  and  ^20 
suits  which  allows  determination  of  the  (Ai,A2,/*i,/*2)  co¬ 
efficients  [3]. 


If  blind  equalization  is  the  goal,  direct  equahzers  offer  ad¬ 
vantages  over  indirect  approaches  which  estimate  first  the 
channel  H  and  then  invert  it  to  obtam  the  equalizer.  When 
the  noise  spectrum  is  known,  the  Wiener  inverse  trades  off 
perfect  (or  zero-forcing)  equalization  with  SNR  improve¬ 
ment.  Similarly,  SNR  gain  is  obtained  if  (9)  is  solved  using 
(weighted  or  total)  least-squares  depending  on  whether  v(n) 
is  colored  or  white. 

If  blind  channel  identification  is  the  objective,  the  esti¬ 
mated  equalizers  can  be  used  to  recover  ^(n)  from  which  H 
can  be  obtained  by  solving  (3)  using  (batch  or  recursive! 
linear  regression  methods.  The  linear  forms  of  (8),  and  (9) 
should  lend  themselves  to  adaptive  schemes  which  are  cur¬ 
rently  under  investigation  together  with  linear  prediction 
formulations  and  methods  needed  to  select  the  optimum 
shift. 

4.  SIMULATIONS 

Example  1:  We  generated  2-level  PAM  i.i.d.  data  (s(u)  = 
0, 1)  and  passed  them  through  M  =  3  FIR  channels  (m  = 
1, 2,  3)  to  obtain  the  data: 

/=0  i=0 

The  impulse  response  vectors  were  hi(0)  =  [1,0.5, 2]', 
hi(l)  =  [-2.5, 3,0]',  hi(2)  =  [1,5,2]',  h2i(0)  = 

[2, 0.3, -0.7]',  h2i(l)  =  [0.7, 1.2,  3].  Such  a  channel  has  form 
similar  to  that  used  in  magnetic  recording  models  [1].  The¬ 
orem  2  applies  to  this  channel  {Li  =  2  >  L21  =1)  and 
using  one  SVD  we  computed  the  3  x  1  vector  equalizer  of 
order  K  =  2hy  solving  (8)  with  i  =  L\-\-K .  Fig.  2a  depicts 
root  mean-square  error  (rmse)  between  the  true  and  esti¬ 
mated  equalizer  coefficients  for  lengths  N  =  100,..., 900 
at  SNR  20dB  and  40dB;  rmse  vs.  SNR  is  shown  in  Fig. 
2b  for  N  =  50, 100  (averages  were  computed  based  on  100 
Monte  Carlo  runs).  Werestingly,  with  as  little  N  =  200 
symbols,  it  is  possible  to  equalize  linear-quadratic  channels 
with  rmse=O(10”^)  at  SNR=20dB.  A  typical  eye-diagram 
of  one  channel’s  output  is  plotted  in  Fig.  3a  along  with  its 
equalized  version  in  Fig.  3b. 

Example  2:  A  similar  simulation  was  carried  with  4-level 
PAM  data  (s(n)  =  ±3,  ±1)  and  M  =  4  channel  outputs 
were  generated  according  to  the  model  (m  =  1,2, 3,4): 

1=0  l-O 
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Figure  2.  RMSE  curves  for  Example  1 

with  hi(0)  =  [1,0.5, 2, 0.1]',  hi(l)  =  h2.5,3,0,-l.l]', 
li2o(0)  =  [0.01,0.5,0.2,0.03]',  hzofl) 

[0.2,  0.3, -0.7,-0.001],  h2i(0)  =  [0.007,0.001,0.3,-0.15]. 
Figs.  4  and  5  show  that  about  an  order  of  magnitude  more 
data  are  required  to  achieve  performance  similar  to  that  in 
Figs.  2  and  3,  a  consequence  of  the  fact  that  two  SVDs  are 
required  for  this  model  [3]  (note  that  here  =  X20  =  1, 
L21  =  Of  K  =  1),  To  illustrate  the  importance  of  incorpo¬ 
rating  nonlinearities  over  adopting  linear  approximations, 
we  supposed  that  the  data  come  from  a  linear  channel  of 
order  L  =  3,  and  using  M  =  2  outputs  we  designed  an  order 
iT  =  2  linear  equalizer  by  inverting  the  channel  estimate  of 
[11].  The  equalized  eye-patterns  for  the  2-  and  4-level  PAM 
data  are  shown  in  Fig.  6.  The  importance  of  adopting  the 
correct  model  is  evident  if  one  compares  Figs.  3  and  5  with 
Fig.  6. 


Figure  3.  Eye-patterns  for  Example  1 
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Figure  4,  RMSE  curves  for  Example  2 
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Figure  5.  Eye-patterns  for  Example  2 
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Abstract* 

In  order  to  maximise  the  efficiency  of  the  RF  amplifier 
located  in  a  transmitter,  for  instance  in  both  analog  and 
digital  terrestrial  TV  links,  it  is  forced  to  work  near 
saturation  introducing  thus  an  undesirable  nonlinear  effect. 
A  common  solution  includes  a  predistortion  system 
before  the  modulation  that  compensates  as  much  as 
possible  the  posterior  nonlinear  distortion,  in  such  a  way 
that  the  overall  performance  of  the  transmitter  results  in  a 
linear  and  efficient  amplifier.  Polynomial  models  usually 
implement  the  predistortion,  but  in  this  paper  we  propose 
an  alternative  model  based  on  the  Fourier-exponential 
series  that  shows  better  performance  in  the  design  stage 
without  a  significant  increase  of  the  complexity. 

1 .  Introduction 

The  distortion  introduced  by  a  High  Power  Amplifier 
(HP A)  located  in  the  transmitter  of  a  terrestrial  link  is 
usually  equalised  by  the  so-called  feed-forward  method,  the 
negative  feedback  method  or  the  predistortion  method.  Tlie 
first  one  has  a  cost  limitation  since  it  needs  two  HPAs, 
which  are  quite  expensive  elements  of  tlie  RF  link.  The 
so-called  negative  feedback  method  is  another  RF  or 
intermediate  frequency  solution,  which  has  an  inherent 
instability  problem.  On  the  contrary,  tlie  predistortion 
method  has  no  loop  (avoiding  thus  any  instability)  and  it 
results  in  a  cheap  solution  since  it  can  be  implemented  at 
the  baseband  level  by  a  DSP.  Nevertheless,  in  order  to  be 
able  to  apply  the  predistortion  at  the  symbol  level,  several 
aspects  should  be  taken  into  account.  First  of  all,  the  fact 
that  the  HPA  is  located  in  the  transmitter  ensures  the 
introduced  nonlinear  distortion  is  memoryless.  This 
property,  along  with  the  bandpass  behaviour  of  tlie  HPA 
[1],  allows  a  lowpass  equivalent  formulation  where  die 
HPA  is  completely  characterised  by  the  so-called  AM/ AM 
and  AM/PM  curves  which  relates  the  input  amplitude  to 
the  output  amplitude  and  output  phase,  respectively. 
These  curves  are  supposed  to  be  independent  ot  the 
frequency  for  narrow  bandpass  signals  and  tliey  can  be 
obtained  by  measuring  the  output  of  tlie  HPA  when  it  is 
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driven  with  a  pure  tone  of  the  carrier  frequency.  The 
AM/ AM  and  AM/PM  relations  are  needed  for  the  design 
stage,  where  the  parameters  of  the  predistortion  are  set.  In 
general,  a  memoryless  Volterra  system  is  chosen  to  model 
the  predistortion,  being  its  coefficients  fitted  by  an 
adaptive  learning  that  is  applied  periodically  before  the 
data  transmission  due  to  die  slow-dme  variation  in  the 
HPA  characteristics. 

In  this  paper,  the  authors  propose  an  alternative  system 
to  model  the  predistortion  that  shows  better  performance 
in  adaptive  designs  than  the  V olterra  model  does.  Section 
2  is  thus  devoted  to  present  this  model,  which  is  based  on 
a  Fourier  series  development.  In  Section  3,  the  particular 
HPA  predistortion  problem  is  focused  emerging  the  role 
and  design  of  the  memoryless  nonlinear  models.  Finally, 
the  simulation  results  are  included  in  Section  4  where  the 
perfoitnance  of  the  Fourier  versus  tlie  Volterra  model  in 
tliis  particular  topic  is  compared. 

2 .  The  Fourier  model 

The  Fourier  model  arises  from  the  Fourier  series 
development  of  the  input/output  relation  of  the  actual 
nonlinear  system  (NLS).  If  g[x]  denotes  the  relation  of  a 
given  memoryless  NLS,  and  x  is  the  input,  the 
approximation  of  a  N-order  Fourier  is  the  following  one. 

n=-N  (1) 

It  is  important  to  remark  that,  in  order  to  avoid 
aliasing  in  the  approximation  provided  by  the  Fourier 
model,  the  input  signal  x  should  be  bounded,  i.e.  xe  [- 
Xmax,  Xmax],  being  also  the  principal  frequency  upper- 
bounded. 

2n  ^  1% 

(2) 

Some  previous  works  about  tlie  Fourier  model  have 
been  already  done,  even  with  nonlinear  problems  with 
memory  [2,3].  Without  going  into  details,  an  important 
feature  of  this  model  is  the  fact  tiiat,  once  the  order  N  and 
the  principal  frequency  ®o  are  chosen,  the  model  is  linear 
with  the  rest  of  coefficients,  (cn).  This  property  allows  a 
MMSE  criterion  for  designing  the  coefficients  [cn]  and. 
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moreover,  the  use  of  classical  adaptive  methods  to  lead  the 
model  to  this  solution  can  be  also  applied  [4], 

In  the  HPA  predistortion  problem,  the  simplified 
model  versions  that  consider  an  even  or  odd  symmetry  in 
the  NLS  input/output  relation  are  specially  interesting. 
Thus,  the  Fourier  model  allows  a  simplification  when 
g[x]  has  an  odd  (eq.3.a)  or  an  even  (eq.3.b)  symmetry.  The 
complexity  is  considerably  reduced  in  comparison  with 
the  general  Fourier  model  (eq.l)  due  to  the  real  character 
of  the  coefficients  and  the  functions. 

N 

V g[A  =  -s[-A  =>  Xfl/)  Sin(nmo 

"=i  (3.a) 

N 

=  =>  X^/j  Cos(nfflo^) 

"=o  (3.b) 

Before  dealing  with  the  HPA  predistortion  problem,  it 
is  interesting  to  compare  the  Fourier  model  and  the 
Volterra  model  in  terms  of  complexity.  Thus,  an  N-order 
Volterra  model  needs  of  the  order  of  0(N)  multiplications 
to  provide  the  successive  powers  of  the  input  signal, 
whereas  an  N-order  Fourier  model  needs  0(4N)  real 
multiplications  to  compute  the  successive  powers  of  the 
first  order  complex  exponential,  expfjcoox).  In  fact,  the 
memoryless  Fourier  model  can  be  basically  viewed  as  an 
N-order  Volterra  model  preceded  by  an  exponential 
transformation  (fig.1)  and,  in  consequence,  both  models 
involve  the  same  order  of  operations  to  generate  the 
respective  input  data  space  (apart  from  the  cost  of 
computing  the  first  complex  exponential  function). 


Figure  1.  The  memoryless  Fourier  model 
implementation 


In  case  of  dealing  witli  die  symmetric  models,  the 
Volterra  model  uses  the  half  of  the  operations,  whereas 
the  Fourier  model  (eq.3.a,3.b)  needs  the  same  because  the 
cosine/sine  functions  are  obtained  as  the  real/imaginary 
parts  of  the  respective  exponential  functions.  Concerning 
the  number  of  coefficients  that  determine  tlie  computation 
load  in  the  adaptive  design,  the  Volterra  model  has  N 
coefficients  and  the  Fourier  model  has  (2N-I-1)  complex 
ones.  NeverUieless,  this  number  for  the  symmetric 
Volterra  model  is  N/2,  and  for  tlie  Fourier  Cosine  or  Sine 
models  becomes  N  real  coefficients. 


3.  The  HPA  predistortion  problem 

Let's  consider  a  digital  link  with  a  transmitter  as  the 
one  shown  in  figure  (2).  In  a  practical  situation,  the 


predistortion  is  located  before  the  modulation,  being 
usually  designed  by  means  of  an  adaptive  method  applied 
previously  to  the  data  transmission. 


Figure  2.  Adaptive  learning  of  the  predistortion 
system. 

The  input  to  the  HPA  is  denoted  by  x(t)  and  results  in 
a  narrow  bandpass  signal,  centred  round  the  frequency  cOc 
with  an  instantaneous  amplitude  and  phase  represented  by, 
Rt  and  0t,  respectively.  Thus,  the  output  of  the  HPA  can 
be  approximated  by  the  following  bandpass  signal, 
y(f)  =  HPA[x(t)\  =  F[Rt  ]  •  cos(mct  +  0,  +  <p[Rt  ])  ^4^ 

which  involves  the  functions  F[R]  and  (()[R]  that  represent 
the  so-called  AM/ AM  and  AM/PM  relations  of  the  HPA. 
In  the  simulations,  these  functions  approximate  the  actual 
AM/M4  and  AM/PM  curves  proposed  in  [5].  Whereas  the 
amplitude  distortion  (fig. 5. a)  follows  the  relation, 

F[R]  =  sign{R)  *  0.62  *  (l  -  exp(-/?2  /  0.25)) 

the  phase  predistortion  <t)[R]  (fig.S.b)  is  implemented  by 
an  even  polynomial  with  9  coefficients. 

The  actual  HPA  output  (eq.4)  makes  evident  the  fact 
that  these  curves  completely  characterise  the  HPA  and, 
moreover,  that  they  can  be  seen  as  a  lowpass  equivalent 
transformations.  TTius,  as  it  is  shown  in  figure  (3)  for  a 
discrete  system,  the  predistortion  design  allows  a  lowpass 
formulation  useful  not  only  to  find  the  design  equations, 
but  also  for  the  simulations  since  we  have  not  available 
real  data. 
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Concerning  the  amplitude  predistortion  denoted  by 
gA/A[-].  it  can  be  designed  in  order  to  minimise  the 
amplitude  error  eR(k),  defined  as  the  difference  between  the 
HPA  AM/AM  output  and  the  desired  magnitude  Rk  (note 
that  the  HPA  amplification  is  an  scale  factor,  not  included 
in  the  predistortion  design). 

enik)  =  Rk-  Flh ]  =  Fk-  F[gA/A 

Thus,  ideally,  the  relation  gA/A[R]  should  be  the 
inverse  transformation  of  F[R].  With  respect  to  the  phase 
predistortion  denoted  by  the  function  gA/P[.].  note  that  the 
minimisation  of  the  phase  error, 

egik)  +  ^>[4]  “  0  ^  8A/p[h]  = 

conveys  to  a  basic  identification  problem  since  the  phase 
predistortion  is  applied  after  die  amplitude  predistortion. 

It  is  important  to  remark  that,  although  some  noise 
will  be  present  at  the  output  of  the  actual  HPA,  the 
predistortion  design  equations  derived  from  expressions  (6) 
and  (7)  are  useful  since  the  main  goal  of  the  predistortion 
is  to  compensate  nonlinearities,  without  taking  care  of  the 
noise.  In  the  simulations,  an  additive  Gaussian  noise  at 
the  output  of  the  HPA  curves  will  be  considered  with  a 
high  SNR  (as  it  usually  happens  in  terrestrial  RF 
transmission)  tliat  allows  a  good  performance  of  the 
proposed  predistortion  design. 

3.1.  The  Predistortion  Modeis 


The  odd  symmetry  of  the  AM/ AM  curve  determines 
the  model  that  implements  the  amplitude  predistortion.  In 
fact,  the  function  gA/A[.]  should  also  follow  an  odd 
relation  and  two  possible  models  arise  from  die  respective 
Vol terra  or  Fourier  series. 


N  ^ 

Xflf  («)  sin(n(«oRfc)  =  («f )  -nf 

71=1 


(8.b) 


The  linear  dependence  of  both  models  allows  a  vector 
notation  in  terms  of  tlie  coefficient  vector  ak  and  the  data 
vector  uk,  which  gathers  the  power  functions  for  the 
Volterra  model  and  tlie  sine  functions  for  the  Fourier-Sine 
model.  Tlie  coefficients  in  both  models  aie  time  dependent 
since  they  are  modified  during  tlie  learning  stage  in  order 
to  minimise  the  power  of  the  amplitude  error.  In  [5],  a 
kind  of  gradient  adaptive  algorithm  is  considered  to  update 
the  value  of  the  coefficients  in  die  opposite  direction  of 
the  instantaneous  gradient.  The  resulting  adaptive 
algorithm  is  called  predistortion  LMS  (PLMS)  algoritiim 
and  it  involves  the  gradient  of  the  AM/ AM  characteristic 
of  the  HPA  with  respect  to  tlie  input  value. 


a{k  + 1)  =  a{k)  +  ^  e^(/:) 


mx) 


dx 


u{k) 


(9) 


In  the  simulations,  the  exact  gradient  of  the  proposed 
function  F[.]  has  been  used  although  in  a  real  situation  it 
can  be  also  estimated  in  sections  and  stored  in  a  table. 

Similarly,  the  AM/PM  curve  follows  an  even  relation 
and  thus,  the  phase  predistortion  system  should  be  also 
even.  An  even  memoryless  Volterra  model  (eq.l0.a)  and 
also  a  Fourier-Cosine  model  (eq.lO.b)  both  of  order  N  are 
proposed  to  implement  the  phase  predistortion  denoted  by 
gAd’[.]. 


gAfp[k]=  I  ^f^(«)  cos{ntao4)  =  (’f^ 

The  update  equations  for  the  coefficients  bk  involved 
in  the  phase  predistortion  models  can  be  performed  by 
classical  adaptive  methods  (the  NLMS  algorithm  is 
proposed),  since  the  design  problem  consists  in  a  basic 
identification  problem  with  a  model  that  is  linear  in  terms 
of  its  coefficients. 

b{k  +  \)  =  b{k)  +  -^eQ{k)v{k) 

p{k)  (11) 

The  term  p(k)  denotes  tlie  estimated  input  power 
approximated  by  a  lowpass  filtering  of  the  input  data 
vector  vk  with  a  memory  factor  named  p.  It  is  interesting 
to  note  that,  since  the  output  of  the  amplitude 
predistortion  drives  tlie  phase  predistortion,  tlie  learning  of 
the  coefficients  of  the  gA/P[-]  model  will  be  conditioned 
to  the  learning  of  the  amplitude  predistortion  system. 


4 .  Simulation  Results 


In  this  section,  the  results  obtained  in  the  simulation 
of  the  adaptive  learning  of  the  predistortion  (fig. 3)  are 
presented.  The  input  signal  is  a  64-QAM  modulation 
generated  from  two  8-PAM  signals,  for  the  in-phase  Ik 
and  tlie  in-quadrature  Qk  components.  The  resulting 
magnitude  should  be  less  than  0.62,  which  is  the  range 
capable  of  being  compensated  since  the  output  of  the 
normalised  F[.]  function  of  the  BIT  transistor  is  bounded 
to  tliis  range  (eq.5).  A  Gaussian  noise  (SNR=60dB)  is 
also  added  to  tlie  output  of  tiie  HPA  distortion. 

Concerning  the  amplitude  distortion,  gA/A[.],  two 
different  models  are  considered:  an  odd  Volterra  (oV) 
model  (eq.S.a)  and  a  Fourier-Sine  (FS)  model  (eq.S.b), 
both  with  N=5  coefficients.  The  principal  frequency  for 
die  Fourier  model  is  chosen  equal  to  coo=7t/(2*().62)  since 
the  input  value  R  is  bounded  to  0.62.  The  coefficients  are 
updated  by  die  PLMS  algorithm  (eq.9),  where  the  step 
size  parameters  are  normalised  to  the  power  estimate  of 
the  respective  data  vector  u  (|ioV(k)=0.003/(poV(k))  with 
poV(-l)»0,  pFS(k)=2/N).  At  this  point  it  is  worth  valued 
to  remark  that  the  diversity  managed  by  the  Fourier 
model,  i.e.  the  sine  functions  of  successive  harmonics, 
has  not  so  scattered  power  values  as  the  Volterra  model. 


377 


This  feature,  together  with  other  aspects  concerning  the 
correlation  between  the  components  of  the  vector  uk  [3], 
usually  provides  the  Fourier  model  with  a  better 
performance  than  the  Volterra  model  in  adaptive 
solutions.  A  clear  example  is  shown  in  figure  (4.a)  that 
represents  the  squared  amplitude  error  achieved  by  both 
models  averaged  over  25  independent  realisations  (the 
better  result  is  achieved  by  the  FS  model).  The  step-sizes 
have  been  fitted  after  various  tests  to  achieve  a  similar 
convergence  rate  for  both  models.  Additionally,  the 
amplitude  predistortion  system  implemented  by  the 
Fourier  model  and  Volterra  model  after  the  learning  stage 
are  included  in  figure  (5.c). 


AM/AM  Predistortion.  Odd  Voften'a  5  coef.  Fourier_Sin®  5  coef. 


Time  Step  (b) 

Figure  4.  Squared  envelope  error  (a)  and 
squared  phase  error  (b)  averaged  over  25 
realisations. 


Similarly,  the  averaged  squared  phase  error  achieved  by 
the  AM/PM  predistortion  is  included  in  figure  (4.b)  (tlie 
Fourier  model  also  shows  the  better  results).  For  this 
nonlinear  system,  an  even  Volterra  (eV)  model  and  a 
Fourier-Cosine  (FC)  model  with  9  coefficients  are  used. 
The  respective  coefficients  are  updated  by  the  NLMS 
algoritlim  (eq.ll)  with  tlie  following  step-sizes  (|iVe=0.05 
pVe(-l)==0,  |lifc=0.3  pfc(-1)=1).  In  this  case,  the  principal 
frequency  of  tlie  Fourier  model  is  chosen  equal  to 
€00=71/(2)  because  the  input  to  our  model,  denoted  by  ^Rk 
is  bounded  to  one.  At  the  beginning  of  the  learning,  this 
assumption  does  not  hold  and  the  phase  predis  tor  lion 
learning  could  be  in  troubles.  Thus,  in  tlie  simulations, 
the  output  of  the  amplitude  predistortion  system  is  forced 
to  be  less  than  one  in  order  to  avoid  tliis  problem.  From 
the  error  performance,  it  can  be  seen  how  the  phase 
predistortion  is  conditioned  to  the  convergence  of  tlie 
amplitude  predistortion  as  it  was  expected.  Finally,  the 
AM/PM  predistortion  implemented  by  botli  models  after 
the  learning  stage  are  also  shown  in  figure  (5.d),  where 
the  superior  performance  of  tlie  Fourier  model  becomes 
evident. 


HPA  AM/AM  Characteristic  HPA  AM/PM  Characteristic 


Figure  5.  (a)  HPA  AM/AM  relation,  (b)  HPA 
AM/PM  relation.  (c)  Ideal  amplitude 
predistortion  (solid)  and  final  amplitude 
predistortion  of  the  Fourier-Sine  model 
(dotted)  and  odd  Volterra  model  (dashed),  (d) 
Ideal  phase  predistortion  (solid)  and  final 
phase  predistortion  of  the  Fourier-Cosine 
model  (dotted)  and  odd-Volterra  model 
(dashed). 


Remarks 

In  this  paper  the  HPA  predistortion  implemented  by  a 
Fourier  model  is  compared  with  the  performance  achieved 
by  the  classical  solution  of  using  polynomial  models. 
Although  the  Fourier  model  requires  more  computational 
load  than  the  Volterra  model,  the  existing  fast  DSP 
processors  as  well  as  the  considerably  superior 
performance  achieved  in  this  particular  problem  by  the 
Fourier  model  seem  to  justify  the  use  of  this  last  one. 
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Abstract 

consider  the  identification  of  a  class  of multiple  input-output 
nonlinear  systems  when  the  inputs  are  stationary  non-Gaussian 
processes.  Currently,  there  are  very  few  identification  techniques 
which  exist  to  solve  this  complicated  problem.  In  an  attempt  to 
provide  a  solution,  we  extend  the  single  input-output  Hammer- 
stein  series  to  a  multiple  input-output  version.  Our  solution  for  the 
multiple  input-output  problem  in  the  non-Gaussian  case  is  math¬ 
ematically  tractable  and  computationally  attractive.  Real  data 
experiments  are  shown  to  indicate  the  usefulness  of  the  method. 


1.  Introduction 


The  identification  and  analysis  of  multiple  input-output  systems 
is  a  problem  of  practical  importance,  which  finds  special  applica¬ 
tion  in  seismic  and  array  processing,  physiological  modelling,  and 
vibration  analysis  (e.g.,  [1,  2,  4]).  Most  multiple  input-output 
system  identification  procedures  are  based  on  the  assumptions  of 
linearity  and  Gaussianity,  which  inevitably  exhibit  limitations  and 
weaknesses  in  practice.  This  is  problematic  as  there  are  many 
multiple  input-output  nonlinear  systems  where  the  inputs  are  non- 
Gaussian  processes  (e.g.,  [4,  1,  9]).  However,  very  few  com¬ 
putationally  efficient  techniques  exist  to  solve  this  identification 
problem. 

A  multiple  input-output  Volterra  model  has  been  proposed  [1], 
but  this  can  lead  to  an  unwieldy  multiple  input-output  system  de¬ 
scription.  In  addition,  the  Volterra  approach  necessitates  large 
computational  requirements  in  estimation.  In  an  attempt  to  over¬ 
come  these  problems,  we  consider  extending  the  single  input- 
output  Hammerstein  series  model  (see  [8])  to  the  multiple  input- 
output  case.  The  Hammerstein  series  has  been  shown  to  exhibit 
distinct  practical  advantages  over  the  Volterra  series  in  the  non- 
Gaussian  case.  We  derive  solutions  for  the  multiple  input-output 
problem  using  the  Hammerstein  series  and  also  derive  multiple 
nonlinear  coherence  functions.  The  approach  represents  a  math¬ 
ematically  tractable  and  computationally  attractive  solution  to  the 
multiple  input-output  nonlinear  system  identification  problem  in 
the  non-Gaussian  case. 


2.  The  Identification  Procedure 

2.1  The  Multiple  Input-Output  Model 

We  consider  the  multiple-input  multiple-output  (MIMO)  non¬ 
linear  system,  as  it  represents  the  most  generalised  configuration  of 
the  four  multiple  input-output  scenarios  (i.e.,  the  single/multiple 
input-output  configurations).  We  define  the  m-input,  fc-output, 
nth  order  discrete-time  time-invariant  Hammerstein  series  as 

n  m  oo 

gpr,{T)X„{t-ry+Nr{t),  (1) 

g=l  p=l  r  — —  OO 

with  inputs  Xp{t),  p  =  1,2, . . .  ,m  and  outputs  Yr{t),  r  = 
1, 2, . . .  ,  /c  and  where  the  Hammerstein  kernel,  gpTqi'^)*  relates 
to  the  pth  input,  rth  output,  and  qth  nonlinearity  respectively  for 
g  =  1, 2, . . .  ,  n.  Note  also  that  we  have  allowed  for  an  additive, 
zero-mean  disturbance  signal  Nr{t),  where  we  assume  that  Xp{t) 
and  Nr{t)  are  independent  for  all  p,  r  and  that  Nr(t)  and  Ns{t) 
are  independent  for  all  s,  r  =  1,2,...  ,  fe,  s  ^  r,  and  all  t.  We 
also  assume  that  the  inputs  are  real,  zero-mean,  and  stationary 
processes  with  bounded  cumulants  up  to  2nth  order.  Equation  (1) 
represents  an  extension  of  the  Hammerstein  series  which  has  been 
successfully  applied  in  the  single-input  single-output  (SISO)  case 

In  order  to  simplify  the  formulation  of  the  problem,  consider  a 
vectorial  version  of  the  MIMO  nonlinear  system  in  (1),  i.e., 

^=1  r——oo 

where  X(f),  Y{t),  and  N(i)  respectively  represent  the  m,  k, 
and  k  vector-valued  input,  output  and  noise  processes,  and  where 
the  gth  order  [m  x  k]  MIMO  Hammerstein  kernel  matrix  g,(f) 
represents  the  collection  of  all  Hammerstein  kernels  of  order  q, 
i.e.,  the  p,  rth  element  of  Sq{t)  is  gprqit)-  The  notation  X®^(f) 
represents  the  5-fold  Hadamard  self-product  of  X(f)  with  itself, 
i.e.,  the  pth  element  of  X®«(i)  is  Xp{f)«.  The  vectorial  notation 
conveniently  enables  us  to  express  the  MIMO  model  in  similar 
manner  to  the  SISO  nonlinear  case  [8].  Note  however  that  unlike 
most  other  nonlinear  system  identification  procedures  in  the  non- 
Gaussian  case,  we  do  not  linearise  the  model  in  finding  a  solution 
(e.g.,  cf.  [5, 6]). 
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2.2  Solving  for  the  MIMO  Hammerstein  Kernels 

Since  we  are  considering  the  identification  of  a  nonlinear  sys¬ 
tem  in  the  non-Gaussian  case,  our  solution  naturally  involves  the 
use  of  higher  order  cumulant  sequences  [3].  We  are  able  to  greatly 
simplify  the  formulation  of  the  solution  by  only  requiring  a  partic¬ 
ular  slice  of  the  cumulant  sequence  instead  of  the  usual  cumulant 
sequences  [8].  The  required  [p  H-  q)th  multivariate  cumulant  se¬ 
quence  slice,  cxpxg  (r),  is  defined  as 

cxpx^(r)  ^  , 

for  p,  g  =  1,2,..,  ,  n,  where  cum  represents  the  cumulant  op¬ 
erator  and  '  is  the  matrix  transpose  operator.  Note  that  the  above 
cumulant  sequence  slice  involves  products  of  multivariate  random 
processes,  which  can  be  expressed  in  terms  of  (p  -h  g)th  and  lower 
order  cumulant  sequences  (see  [3]). 

A  minimum  mean-square  error  criterion  for  the  MIMO  nth 
order  Hammerstein  kernels  leads  to  the  set  of  [n  x  n]  linear  (block) 
matrix  equations 

n  oo 

cyx“  =  ^  ^  gq  (r)'c^«x.“  “  t)  ,  (2) 

^  =  1  T  =  ~00 

for  «  =  1,2,...  jTi.  Taking  the  Fourier  transform  of  (2)  with 
respect  to  v  gives 

n 

(w)  =  ^  G,  (w)'Cx«X“  (w) ,  (3) 

for  n  =  1,  2, . . .  ,n  where  ^^e  one-dimensional 

(integrated)  polyspectral  representation  [8]  corresponding  to  the 
Fourier  transform  of  (r),  and  6^(0;)  is  the  Fourier  trans¬ 

form  of  g^(f)  with  respect  to  t  which  we  call  the  Hammerstein 
transfer  functions.  The  key  result  to  note  is  that  the  MIMO  Ham¬ 
merstein  transfer  functions  have  separated  from  the  multivariate 
integrated  polyspectra  (cf.  the  Volterra  series  in  the  non-Gaussian 
case).  Using  the  Hammerstein  series  greatly  reduces  the  computa¬ 
tional  requirements  in  estimation.  Simultaneously  solving  (3)  leads 
to  optimal  mean-square  solutions  for  (cu),  g  =  1, 2, . . .  ,  n 


'Gi(a;)- 

••  ^:X»»x(t^)' 

••  ^x^x^i^y 

-1 

■  CYxi<^y  ■ 

G2(w) 

= 

^xx^i^y  • 

-Gn(ct;)J 

Cxx"  . 

••  Cx’^xn  (cj)'^ 

Cyx"  (w)'_ 

Thus  we  have  derived  a  general  solution  for  a  multiple  input- 
output  nth  order  nonlinear  system  identification  in  the  general  non- 
Gaussian  case.  Note  that  we  do  not  perform  a  discrete  frequency 
regression  as  in  [5],  but  solve  (4)  with  respect  to  the  Hammerstein 
transfer  functions  as  a  continuous  function  of  a;. 

2.3  MIMO  Nonlinear  Coherence  Function 

The  coherence  function  is  very  important  in  system  identifica¬ 
tion  as  it  provides  a  practical  mechanism  for  model  validation  and 
system  analysis.  We  use  the  notion  of  system  coherency  [5,  8]  to 
derive  a  MIMO  nonlinear  coherence  function.  A  block  matrix  ver¬ 
sion  of  the  MIMO  model  is  introduced  so  that  the  derivation  of  the 


MIMO  nonlinear  coherence  function  is  not  obscured  by  notation. 
Let  the  system  of  block  matrix  equations  in  (2)  be  expressed  as 

Gyx(w)  =  SMexx(ci;) ,  (5) 

where  Cyx(a;)  and  exx(a;)  are  [A:  x  mn]  and  [mn  x  mn]  block 
matrices,  respectively.  The  above  equation  leads  to  the  general 
solution  for  S(u;)  as  in  (4), 

g(a;)  =  eyx(a;)  exx(w)“^ .  (6) 

Let  the  [A;  x  k]  output  spectral  density  matrix  of  the  MIMO  non¬ 
linear  system,  Cyy(a;)  be  denoted  by 

Cyy(a;)  =  S(c*^)exx(a;)g(a;)"  -h  C^(a;) ,  (7) 

where  the  dimensions  of  (7)  are  [A;  x  A:]  =  [A;  x  mn]  [mn  x 
mn]  [mn  x  A;].  An  expression  for  the  [A:  x  A;]  MIMO  nonlinear 
coherence  function  is  found  by  substituting  the  expressions  for 
S(w)  in  (6)  into  (7).  Thus  the  MIMO  nonlinear  coherence  matrix, 
Dl(u;),  is  given  by 

:R(a;)  =  Cyy(a;)~^eyx(a;)Cxxfcu)'‘^evy(a>)^  . 

where  the  above  matrix  dimensions  are  [A:  x  A;]  =  [A;  x  mn]  [mn  x 
mn]  [mn  x  A;].  This  yields  an  expression  for  the  MIMO  nonlinear 
coherence  function.  Since  the  MIMO  nth  order  Hammerstein 
series  has  A:-outputs,  it  follows  that  IR(cj)  is  a  [A;  x  A;]  matrix  with 


elements 

~Rii(u;) 

Rl2(Ci^) 

•  Rlk((^y 

0l(uj)  = 

R21  (a>) 

R22(<^)  * 

•  R2k(ctj) 

* 

Rkl(uj)  •• 

Rkkiuj). 

where  Ruv  n,  u  =  l,2,...  ,A:  represents  the  nth  order  coher¬ 
ence  function  of  the  MIMO  model.  Thus  a  solution  for  the  MIMO 
nth  order  coherence  function  has  been  derived  which  does  not 
depend  on  any  unknown  MIMO  Hammerstein  transfer  functions. 

2.4  Discussion 

Special  Cases.  It  is  straightforward  to  show  how  the  MIMO  model 
in  (4)  reduces  to  the  single-input  multiple-output,  multiple-input 
single-output,  and  single-input  single-output  configurations,  i.e., 
m  =  1,  r  =  1,  and  m  =  r  =  1,  respectively. 

Parameterisation  Considerations.  Although  the  Hammerstein 
series  is  not  as  general  a  mathematical  model  as  the  Volterra  se¬ 
ries,  its  use  leads  to  significant  reductions  in  the  number  of  co¬ 
efficients  required  in  system  modelling.  As  a  simple  example 
of  the  parameterisation  advantages  achieved  in  using  the  Ham¬ 
merstein  series,  consider  a  two-input,  two-output  cubically  non¬ 
linear  system  with  a  system  memoiy  of  M  =  10  lags.  To 
model  this  system,  the  Hammerstein  series  requires  Mnkm  = 
120  coefficients,  whereas  the  Volterra  series  requires  a  total  of 
+  g  —  1)!/((M  ~  l)!g!)A;m  =  1140  distinct  coeffi¬ 
cients. 
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2.5  Estimation 

We  make  estimates  of  the  multivariate  integrated  polyspectra 
using  an  averaged  periodogram  based  approach  [3].  A  similar 
approach  is  used  for  the  univariate  case  in  [8, 10]. 

We  assume  that  the  input-output  sequences  X(t)  and  Y{t)  are 
available  fort  =  0, . . .  ,T-1.  Given  that  the  data  is  stationary,  we 
segment  the  sequences  into  M  stretches  each  of  length  N,  denoted 
by  X{t,  m)  and  Y{t,  m),  respectively,  for  m  =  1, . . .  ,  M,  such 
that  T  =  MN.  The  finite  Fourier  transform  of  X{t,  m)"  is  given 
by  [7] 

m)  =  ^  (X(t,  m)  -  cx„ )"  n  =  1, 2 , 

t=0 

where  cxm  is  the  sample  mean  of  X(t,m).  For  the  third-order 
case,  the  cumulant-moment  relationship  is  given  by 

Cx2x(t)  -  cum{X(f)X{f),X(f-r)} 

=  E{X(tfX(t-T)} 

forE{Jt(f)}  =  0,  which  represents  the  sliced  third  order  cumulant 
sequence  of  X(t).  This  expression  suggests  the  estimator 

which  is  a  form  of  cross-periodogram.  An  estimate  for  Cx^x 
is  found  by  averaging  over  the  M  terms  in  (9),  and  smoothing 
with  an  appropriate  weighting  function  in  the  frequency  domain 
[7].  In  using  the  weighting  functions  we  assume  that  the  spectra 
have  some  smoothness  properties.  The  large  sample  properties  of 
this  class  of  estimate  are  discussed  in  [3]. 

Estimates  of  Cxx(t*^)»  Cxx^(^)*  ^x^x^(^)* 

Cyx  {oj)  and  Cyx^  (^)  found  in  a  similar  manner  to  the  above. 
The  estimates  of  the  Hammerstein  transfer  functions  and  the  non¬ 
linear  coherence  function  are  subsequently  found  by  substituting 
the  estimated  polyspectra  into  (4)  and  (8).  respectively  for  a  given 
n,  ky  and  m. 

3.  Engine  Transmission  Modelling 

We  first  verified  the  multiple  input-output  system  identification 
technique  using  a  simulated  nonlinear  system  where  we  obtained 
good  results.  We  then  applied  the  method  to  a  practical  identifica¬ 
tion  problem  relating  to  engine  knock  (see  [8, 12]). 

An  effective  means  for  lowering  fuel  consumption  and  im¬ 
proving  the  efficiency  of  a  combustion  engine  is  to  increase  the 
compression  ratio.  However,  this  also  increases  the  occurrence  of 
an  abnormal  combustion  phenomenon  called  knock.  Knock  needs 
to  be  avoided  as  it  results  in  an  excessively  noisy,  over-heated 
and  inefficient  engine.  If  the  knocking  condition  can  be  detected, 
then  it  can  be  minimised  without  adversely  effecting  overall  engine 
efficiency.  The  knocking  condition  can  be  detected  by  placing  rel¬ 
atively  inexpensive  vibration  sensors  on  the  engine  housing.  Previ¬ 
ously,  a  SISO  quadratic  Volterra  series  has  been  used  to  model  the 
engine  transmission  characteristics  between  the  cylinder  pressure 
signal  and  a  structural  vibration  signal  [11]. 


We  propose  the  use  of  a  single-input  two-output  quadratically 
nonlinear  Hammerstein  series  model  (i.e.,  7n  =  l,r  =  2,n  =  2 
in  (1))  to  model  the  transmission  characteristics  of  a  combustion 
engine  operating  in  a  knocking  condition.  Note  that  we  do  not 
attempt  to  solve  the  engine  knock  problem  here,  but  rather  focus 
on  the  application  of  the  MIMO  Hammerstein  series  as  a  nonlinear 
model.  The  cylinder  pressure  and  engine  vibration  cycles  measured 
from  the  engine  were  used  as  the  system’s  input  and  output  signals 
respectively^  A  1.8  /,  4  cylinder  engine  operating  under  strong 
knocking  conditions  at  full  load  was  used  in  the  experiment. 

A  schematic  of  the  assumed  quadratically  nonlinear  system  is 
shown  in  Figure  1,  and  has  an  input-output  relationship  given  by 

oo  oo 

Yr{t)=  gri(r)A'(t-r)-f-  9r2{r)X{t-r)^^Nr{t) 

r=-oo  r=-oo 

with  input  X(f)  and  outputs  Yr{t)  for  r  =  1,2.  We  assume 
that  the  additive  noise  terms  Nr{t)  are  zero-mean  and  stationary, 
and  that  X{t)  and  Nr{t)  are  independent.  The  cylinder  pressure 
and  structural  vibration  signals  were  used  form  estimates  of  the 
quadratically  nonlinear  Hammerstein  transfer  functions  and  the 
quadratic  coherence  function. 

Using  the  results  from  Section  2,  explicit  solutions  for  the  Ham¬ 
merstein  transfer  functions  are  given  by 

(7x2x2 ~  Cx2x{^)^YiX^{^) 

^  <7xx(w)<7x2x2(t^)  -  (7xx2(t^)(^x2x(^)  * 

Cxx{(-^)CyiX^(^)  ^  (^xx2(<^)(7yix(^) 
i2(w)  C'xx(w)<7x2x2(w)  —  C'xx2(t^)(7x2x(^)’ 

(7x2x2  (^)<7y2x(u;)  -  Cx2x(<^)^y2x2(c^) 

^  Cxx{<^)Cx^x^{^)  ~  Oxx^{^)^x^xi^) 

Cxx(t*^)(7y^x2(^)  ~  Cxx^{^)^y2x{^) 

^  (7xx(ci^)(7x2x2(^)  ""  (7xx2(t^)(7x2x(^) 

The  above  results  represent  an  extremely  efficient  method  for 
computing  the  MIMO  Hammerstein  transfer  functions  as  the  solu¬ 
tion  is  in  closed  form  (cf,  matrix  inversion  techniques  [5]).  The 
quadratic  coherence  function  associated  with  the  quadratic  model 
is  found  in  a  similar  manner.  Figure  2  shows  estimates  of  the  lin¬ 
ear  and  quadratic  Hammerstein  series  kernels  for  the  two  paths  of 
the  model,  and  the  linear  and  quadratic  coherences  for  the  model. 
The  closeness  of  the  coherence  function  to  unity  indicates  the  gen¬ 
eral  goodness  of  fit  of  the  model.  We  found  that  the  quadratic 
model  provided  a  better  characterisation  of  the  engine  block  than 
a  single-input  multiple-output  linear  model. 

4.  Summary 

We  have  formulated  a  procedure  for  identifying  a  class  of  mul¬ 
tiple  input-output  nth  order  nonlinear  systems  when  the  inputs 

^Acknowledgement:  We  would  like  to  thank  Professor  J.  F.  Bohme 
from  the  Signal  Theory  Division  of  Ruhr  University  Bochum  and  Volkswa¬ 
gen  AG,  Wolfsburg,  Germany,  for  kindly  providing  the  knock  data  used  in 
the  paper. 
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are  stationary  non-Gaussian  processes.  We  have  validated  the 
model  by  deriving  a  multiple  input>output  nonlinear  coherence 
function.  The  solution  has  been  formulated  using  special  forms 
of  polyspectra,  which  significantly  simplifies  the  estimation  and 
implementation  of  the  model.  In  addition,  we  have  avoided  the 
parameterisation  issues  associated  with  the  Volterra  series  by  us¬ 
ing  the  Hammerstein  series  as  the  system  model.  The  solution 
represents  a  simple  and  practical  approach  to  a  difficult  system 
identification  problem.  The  technique  has  been  validated  with  an 
application  to  an  automotive  engineering  problem. 
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X(t) 


mt) 


Y2{t) 


Yi{t) 


Figure  1.  The  single-input  dual-output 
quadratically  nonlinear  Hammerstein  series 
used  as  the  system  model. 


(a)  Hammerstein  transfer  functions 


(b)  Path  1  coherences 


(c)  Path  2  coherences 


Figure  2.  (a)  Hammerstein  transfer  functions: 
Path  1  (top)  and  path  2  (bottom)  showing  lin¬ 
ear  (solid  line)  and  quadratic  (dashed  line) 
transfer  functions;  (b)  Ordinary  linear  (solid 
line)  and  quadratic  (dashed  line)  coherences 
for  path  1;  (c)  Ordinary  linear  (solid  line)  and 
quadratic  (dashed  line)  coherences  for  path  2. 
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Abstract 

In  ihis  paper  closed  form  expressions  for  Hie  identi¬ 
fication  of  second  order  Volterra  systems  are  developed. 
Two  main  cases  are  considered.  The  first  case  imposes 
natural  constraints  on  the  Volterra  kernels  leaving  the 
input  signal  quite  general.  The  second  case  leaves  the 
kernels  in  general  form  hut  puts  constraints  on  the  in¬ 
put.  In  particular  signals  obtained  as  outputs  of  Unear 
systems  driven  by  higher  order  white  noise  are  consid¬ 
ered. 


1.  Introduction 

We  will  be  concerned  with  second  order  Volterra  sys¬ 
tems  of  the  form 

oo 

y(n)  -  ^2  hi{ki)u{n  -  ki)  + 

iti=0 

OO  OO 

Y2  X/  h2{ki,k2)u{n  -  ki)u{n  -  ^2)  +  (1) 

fcl=0  ^2—0 

The  disturbance  and  input  signals  are  independent  zero 
mean  processes.  The  Volterra  kernels  hi(fci),  ^2) 

axe  causal,  absolutely  summable,  symmetric  sequences. 

Closed  form  expressions  for  the  Volt  err  ra  kernels,  for 
a  second  order  system,  have  been  determined  when  the 
input  is  a  zero  mean  stationary  Gaussian  process  [1]. 
The  general  p—th  order  system  with  the  same  assump¬ 
tions  for  the  input  is  treated  in  [2].  These  expressions 
utilize  cumulant  information  in  the  time  or  frequen¬ 
cy  domain.  Recall  that  if  x{k)  is  a  stationary  discrete 
time  random  process  then  the  p  —  th  order  cumulant 
of  x(fc),  denoted  c5(fci,  ^2,  •  •  *  is  defined  as  the 

joint  p  —  th  order  cumulant  of  the  random  variables 
x{k),  x{k  -h  fci),  • '  * ,  x{k  +  fcp-i),  i.e., 

cl{ki,k2{  •  -jfcp-i)  -cvim{x{k),  x{k+ki),^  •  •,  a:(fc4*fcp-i)) 


The  p  -  th  order  polyspectrum  is  the  (p  -  1)- 
dimensional  discrete  time  Fourier  transform  of  the 
p  —  th  order  cumulant  cP(fci ,  ^2,  *  •  *  >  ^P-i)*  Cross- 
cumulant  and  cross-polyspectra  of  two  jointly  station¬ 
ary  stochastic  processes  are  similarly  defined. 

The  Gaussian  asumption  is  not  always  realistic.  At¬ 
tempts  to  handle  the  non  Gaussian  case  for  kernels  of 
compact  support  are  presented  in  [3].  This  paper  de¬ 
rives  closed  form  expressions  for  the  identification  of 
the  Volterra  kernels  of  (1)  in  two  important  cases  :  a) 
banded  Volterra  kernels  and  general  inputs  b)  general 
Volterra  kernels  and  inputs  of  special  type. 

2  Formulation  as  a  fredholm  integral  e- 
quation 

We  first  compute  the  cross-cumulant  of  y  with  one 
and  two  copies  of  the  input,  respectively.  Using  the 
properties  of  cumulants  and  Leonov  -  Shiryaev  theorem 
[4],  we  obtain 

cum[y(n),  u(n— si)]  =^^/ii(fci)cum[u(n— ^1),  u(n  — si)] 
fci 

-f  ^  ^  h2{ki ,  k2)cnm[u{n  -  ki),  u{n  -  ^2),  «(«  “  «i)] 

ki  k2 

cum[y(n)5  u{n  —  si),  u{n  —  S2)]  =  ^  hi{ki) 

ki 

•cum[«(n  —  fci),  u{n  —  si),  u{n  —  ■^2)]+^^]^  ^2(^1) ^2) 

fcl  ^2 

•cum[M(n  —  ki),  u{n  —  k2),  u{n  —  si),  u{n  —  S2)]+ 
+2  EE'-  2(^1,  fc2)cum[w(n  -  fci),  u(n  -  Si)] 

ki  k2 

•cum[u(n  —  ^2))  ”  -^2)] 

Passing  to  frequency  domain  we  obtain 
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Cy4-w)  =  H,{w)C^iw)  + 

1  r 

+  —  J  H2{w  -  W3,  W3)Cl(w  -  11)3,  ‘W3)dW3  (2) 

C'yuu(-«)1,-U)2)  =  H-i{wi  +  W2)C^(-Wi , -W2)+ 

+2H2iwi ,  W2)Ci(wi)C^{w2)+ 

1  r 

+  —  J  H2{wi  +  W2-  U)3,  W3) 

■Cti-W2  ,Wi  +  W2-W3,  W3)dW3  (3) 

Let  w  €  [— 7r,7r].  Note  that  as  long  as  we  move  on 
the  hne  wi  +  W2  =  w,  eq.  (3)  takes  the  form 

C'yuu(-(U'  -  W2),  -W2)  =  Hi{w)C^{-(w  -  W2),  -W2)+ 
+2H2{w  -  W2,W2)Cl{w  -  W2)Cl{W2)-\- 

+  ^y  H2{w  -  W3,W3)C*{-W2,W  —  W3,W3)dW3  (4) 

Solving  eq.  (2)  with  respect  to  Hi{w)  and  substitut¬ 
ing  into  eq.  (4)  we  obtain 

U  tf;)  ^  1  1 

^  C^{w)  Cl{w)  2-k 

■  J  H2{w  -  103,  W3)C^{w  -  U)3,  W3)dW3  (5) 


H2iw  -  102,102)  -  (-— ) 

ZTT 

r  -  W3,  W3)C^{w) 

7-/  2C^(w)Ci(w  -  t02)C2(t02) 

C^-{w-^2)-‘W2)C^rv-W3,W3)  ,  J 

-  ~  ‘^2), -W2) 

2C^(w  -  W2)C^{W2) 

Cyu(-w)C^(-(w  -  W2), -W2)  .  . 

2C'2(to)C2(to  -  W2)CSiw2) 

We  observe  that  eq.  (6)  is  a  Ikedholm  integral  equa¬ 
tion  of  the  second  kind  of  the  form 

x(<)-A  f  Kit,  OxiOd^  =  m 
J  a 


The  2  —  D  function  is  the  so  called,  kernel  of 

the  integral  equation.  In  our  case 

Xif)  r=  H2{w  -  W2,W2)  \  =  -^ 

ZTT 

K(t  ,W-W3,  W3)C^(w) 

2C'„»C2(U)  -  t02)C„>2) 

-  ^2), -W2)C^(W  -  W3,  W3) 

2C^(w)C^(w  -  tO2)C2(l02) 

m)  =  -^2),-W2) 

2(72(10  -  102  )C'2  (102) 

Cyu(-w)C^(-(w  -  W2),  -102) 

2(72(to)(72(io  -  W2)C^(W2) 

The  solution  of  these  equations  can  be  studied  by  vari¬ 
ous  methods,  including  iterated  kernels,  successive  ap- 
proximation,  the  determinants  method  and  the  eigen¬ 
values  method.  Approximate  expressions  are  obtained 
if  the  integral  is  replaced  by  a  finite  sum.  Then  finite 
dimensional  linear  systems  of  equations  result.  Here 
we  confine  ourselves  to  the  determinants  method.  It 
can  be  proved  [5]  that  a  necessary  and  sufficient  con¬ 
dition  for  the  existence  of  a  unique  continuous  solution 
is  that  the  Fredholm’s  determinant 


*  *  ’  d^i/ 

is  not  zero.  The  solution  is  given  by 

x{t)  =  f{t)^X  /'r(t,^;A)/(Od^ 
Ja 


where  the  kernel  r(t,^;A)  is 


A(t,^;A) 


&.{t,i-,\)  =  K{t,  o  +  ^i-iy 


t  ^1  ^2 
^  6 


d^id^2---d^. 


The  notation 


K  ( 

Ui  6  •• 

A'(6,6) 

A(6,6)  A(6,6) 


m2,^u) 


A(6,6)  ... 
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is  employed.  The  above  expressions  are  invoked  in  sec¬ 
tion  4  for  the  computation  of  the  Volterra  kernels  when 
the  input  is  of  special  type. 

An  alternative  expression  for  the  system  of  eqs.  (2) 
and  (4)  is  next  derived  having  the  advantage  that  the 
resulting  format  is  pointwise  linear  in  the  kernels.  More 
specifically  let  us  introduce  the  LTI  filters  with  impulse 
responses 


h2,kQ)^h2{k-\-lJ)  ,  kJeZ 


banded  case  eqs.  (7),  (8)  reduce  to  the  linear  system 
Ax  =  6,  where  the  vector  of  the  unknown  parameters 
is 


X  =  [Hi{w)  Hifiiw)  H2,i{w)  ■■■  H2,M{w)f 

the  first  row  of  the  matrix  A  has  the  form 


Ws,  W3)dW3 


Then 

H2{wi  ,  W2)  = 

m  n 

k  I 

00 

=  H2fi{wi  + 102)  +  53  +  e"-'"'’*') 

i=l 

Hence  eqs.  (2)  and  (4)  become 


Cyu{-w)  =  Hi{w)C^{w)  +  H2,o{w)  — 

r 

•  /  Cl{w  -  W3,  W3)dw3  +  23-^2,; 

1=1 

•  r  Ct{w  -  W3,W3){e-^'^'^-'^‘'>  +  e-i"^-)dw3  (7) 

J—rr 


W3,W3){e-i^^'^-'“*^  +  e-i^'">)dw3  ■  ■  ■ 


Ctiy^  -  W3,  W3)ie-i^(^-'^^^  +  e-i^'">)dw3) 


The  first  column  of  A  has  the  form 


~  ^2,o),  -W2,o)  ■  ■  ■ 

■  ■  ■  -W2,M)f 

Likewise  the  second  column  of  A  is 

(—  f  C^{w-W3,W3)dW3  2Ci{w  -  W2fl)Cu{‘W2,o)+ 
27r  /  - 


27r 


-W2fi,W 


'W3,  W3,)dw3  •  •  •  2Cu  (u; 


W2,m) 


Cy„„(-(iu  -  W2), -W2)  =  Hi{w)C^{-{w  -  W2), -W2)+ 

+H2,oH{2CU^  -  W2)CUW2)  +  ^ 

r  ^ 

I  CtA-W2,W  -  W3,W3,)dW3)  +  JT2,t(w) 

J-r  i=i 

l2C^(w  -  W2)C^(w2)(e-^'^'^-'^^^  +  ^ 

fct{-W2,w-  W3,  W3)ie-^'^'^-'^^'>+e-^'^>)dw3]  (8) 

3  Banded  second  order  volterra  forms 

We  say  that  (1)  is  banded  Volterra  if  the  matrix 
H2  associated  with  the  second  order  kernel  is  a  band¬ 
ed  matrix,  i.e.  there  exists  an  integer  M  such  that 
h2{ki  yk2)  =  0  for  \ki  -  k2\  >  M.  The  meaning  of  these 
systems  is  that  input  products  with  sufficiently  wide 
time  separation  do  not  contribute  to  the  output.  In  the 


•C2(w'2,m)  +  -^  f  C*{-W2,M,'W  -  W3,W3,)dW3f 
The  remaining  elements  of  A  are 

a„,,fc=2C2(«;-u;2,m-2)C2(«^2,m-2)(e--'(*-=*)("-“»- 
^g-i(fc-2)«/2,m-2)  ^  f  C^{-W2,m-2,'W  -  W3,W3) 

27r 

Finally  the  vector  6  is  pven  by 

h  =  (C'ytiC  —  ‘^)  ^2,0)}  “^^2,0) 

•  C'y tiu  (-  -  ^2,M  ) ,  -W2,M  ))^ 

If  the  banded  Volterra  model  is  restricted  to  compact 
support  Volterra  kernels  the  above  set  up  is  analogous 
with  the  frequency  domain  approach  of  [3]. 
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4  Volterra  systems  with  special  non- 
gaussian  inputs 

In  this  section  identification  of  second  order  Volterra 
systems  using  inputs  of  special  type  is  considered. 

Let  us  first  assume  that  the  input  signal  is  an  IID 
zero  mean  random  process  with 

^(*1 ,  •  *  *  j  4-i)  is  the  {k  —  1)  dimensional  unit  sample, 
i.e, 


1 

0 


if  =  .  .  .  =r  =  0 

otherwise 


Then 


K{w2,W^) 


7472  -  li 

27| 


A(A)  —  1  + 


7472  -  li  _  272  +  7472  -  jj 

27|  273 


C*(wi ,  W2 ,  W3)  =  y4G(wi  )G(w2  )G(W3  )G*  (wi  +W2  +  W3) 

It  can  be  proved  in  this  case  that  A(A)  and  A(u;2 ,  w^;  A) 
axe  given  by 


A(A)  —  1-1- 


1 


JJ2CI 


2^  J-.2Cl{w)Cl{w-i4)Cl{^,) 


2Ci{w)Ci{w-^2)Ci{i,) 


)d^i 


A/,,,,  Ct{-W2,W  -  W3,W3)Cl{w) 

-Cl{-{w  -  W2), -W2)Cl{w  -  tt)3,  W3) 

2C^{w)C^{w  -  W2)C^{W2) 


Hence 


r/-  \\  A(u;2,iy3;A) 

r(»,,TOA)  =  — 


Therefore  the  Volterra  kernels  are 


H2{w  —  W2^  W2) 


Cyuui-jw  -  W2),  -W2) 
2CS{w  -  W2)CS{W2) 


A{w2,W3;X)  = 


7472  -  73 

27f 


and 


r(u^2,w^3;  A)  = 


7472  -  73 
272  +  7472  -  73 


Therefore  the  first  and  second  degree  kernels  are  given 

by 


H2{w  —  W2,W2) 


Cyuui—jw  -  W2),  -W2) 

2ll 


_ 7472  /j _ (  ( 

27|(27l  +  7472  -7l)2,ry_>-^ 


73 


2l2  +  7472  -  73 


2  Cyu{—w) 


Hi{w) 


272  +  74 

272  +  7472  -  73 


Cyui-w)- 


27i  +  7472  -  7i  27r  -W3)dw3 

Next  we  assume  that  the  input  signal  is  obtained 
as  the  output  of  a  linear  time  invariant  system  with 
transfer  function  G{w)  driven  by  a  higher-order  white 
noise.  Then  the  spectra  of  the  input  signal  have  the 
form 

C''K)  =  72GK)G*(n;i) 

C^{wi,W2)  =  73G{wi)G{w2)G*  {wi  +1^2) 


Cyui-w)C^{-{w  -W2),-W2) 
2Ci{w)Ci{w  -  W2)ci{w2)  ^ 


W3\X) 


Cyuu{-{w  -  W3),-W3) 

2Cl(w  -  W3)Ci{W3) 


W3;  A) 


Cyu(-w)C^(-(w  -  ■tt)3),  -W3)  , 
2CSiw)CS{w  -  W3)C^(W3) 
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Abstract 

The  performance  of  many  analogue  and  digital  signal  processing 
systems  is  limited  by  nonlinear  distortion  mechanisms  which  can  be 
modelled  with  a  Volterra  series.  The  nonlinear  distortion  can  be 
compensated  by  the  application  of  post  (or pre)— distortion  based  on 
a  Volterra  inverse.  The  computational  complexity  associated  with 
this  type  of  compensation  can  be  very  high,  particularly  for  systems 
with  high  nonlinearity  order  and  long  memory.  In  this  paper  we 
determine  the  3rd  and  5th  order  analytical  Volterra  inverses,  and 
examine  their  associated  computational  complexity.  We  show  how 
the  analytical  Volterra  inverse  can  be  used  to  determine  the  memory 
span  of  the  kernels  of  an  adaptive  Volterra  inverse,  leading  to 
computational  complexity  expressions.  We  then  compare  the 
computational  complexity  of  the  analytical  and  adaptive  Volterra 
inverse.  The  results  show  that  the  analytical  inverse  hasamuch  lower 
complexity  than  the  adaptive  inverse. 

1.  Introduction 

The  Volterra  representation  uses  a  set  of  functionals  and  kernels  to 
model  a  wide  class  of  nonlinear  systems  with  memory  [1].  The 
continuous  time  Volterra  model  is  given  by  (1): 

yit)  =  Ho  +  ffiWOl  +  ••  •  HMt)]  +  •  ■  •  H,i[x(,t)]  (1) 

where  Hg  =  hg  is  the  DC  term,  //„[.]  is  the  nth  order  Volterra 
operator  given  by  (2),  and  h„(Ti,T2,  ■  ■  ■  i®  Ih®  order 
Volterra  kernel. 

«  00  ^00 

=  -  h„(Tj,T2...TnMt-riMl-r2h-x(t-T„^r,dT2...dt„  (2) 

J_oo  J  —  ao 

A  Volterra  inverse  can  be  used  to  compensate  for  nonlinear 
distortion.  For  example,  in  a  previous  paper  [2]  we  demonstrated  how 
a  Volterra  inverse  may  be  used  to  compensate  for  nonlinearities  in  a 
sample  and  hold  with  input  dependent  timing  jitter.  In  this  paper  we 
give  analytical  expressions  for  the  Jrd  and  5th  order  Volterra  inverses, 
and  examine  their  computational  complexity.  We  show  how  the 
analytical  expressions  can  be  used  to  determine  the  memory  span  of 
the  kernels  of  an  adaptive  Volterra  inverse.  We  then  compare  the 
computational  complexity  of  analytical  and  adaptive  Volterra 
inverses,  illustrating  that  the  analytical  inverse  has  a  much  lower 
computational  complexity  than  the  adaptive  inverse. 


2.  Computational  complexity  of  the  Volterra 
model 

The  discrete-time  Mh  order  Volterra  model  with  memory  length 
truncated  to  M  for  all  orders,  and  symmetric  Volterra  kernels  (to  avoid 
redundancy),  can  be  written  as: 

m  =  H^,^[x(k)] 


=  Ho  +  H,lx(k)]  +  ■  ■  •  +  HnWfc)]  +  .  •  ■  +  HMf^)]  (3) 


where  H„[x(k)]  — 


Af-l  M-l  Af-l 

^  h„(mj,m2...m„)x(k-mi)x(k-m2)..Jc(k-m„)  (4) 


For  the  Volterra  model  given  by  (3),  the  total  number  of 
multiplications  required  gives  a  measure  of  the  complexity,  G(N ,  M): 


^  (M-1  +  n)l 

">  -  ?  (L-1)!(M-I). 


Fig.  1  shows  how  the  complexity  varies  with  Volterra  model 
nonlinearity  order  V,  and  memory  M.  The  high  computational 
complexity  places  much  emphasis  on  the  development  of  efficient 
implementations  and  fast  kernel  estimation  algorithms  [3]-[6]. 


♦Contact  information:  Defence  Science  and  Technology  Organisation, 
Communications  Division,  PO  BOX  1500,  Salisbury,  South  Australia,  5108, 
Australia,  Tel:  +61-8-82596403,  Fax:  +61-8-82596328,  Email: 
john.tsimbinos@dsto.defence.gov.au 


Fig.  1:  Computational  complexity  as  a  function  of  Volterra 
model  order  V,  and  memory  span  M 
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3.  The  Volterra  inverse 

Consider  Fig.  2,  where  an  Mhorder  Volterra  model  represented  by 
^  in  (3),  is  followed  by  apth  order  Volterra  inverse 
compensator  G^)[.]  as  in  (6).  In  this  paper  we  consider  two  types  of 
inverses:  an  analytical  inverse  which  is  derived  to  eliminate  all  terms 
up  to  pth  order,  and  an  adaptive  inverse  which  is  obtained  by 
minimising  the  mean  square  error  between  x[k]  and  ^k]. 


Nonlinear 
System  With 
Memory 


pth  Order 
Inverse 


Compensated 

Output 

— 


Fig.  2:  Nonlinear  system  followed  by  a  Volterra  inverse 


G^^[y(k)]  =  Gq  +  Gi[y{k)]  +  ...  +  G„ly{k)]  +  ...  +  Gp[y(*)]  (6) 


3.1  Analytical  Volterra  inverse 

The  analytical  pth  order  Volterra  inverse  G^j  is  defined  as  one 
which,  when  cascaded  with  the  Vth  Volterra  model  results  in  a 
system  Q[.]  with  Volterra  operators  Qj{.\  in  which  the  7st  order 
Volterra  kernel  is  a  unit  impulse  and  the  higher  jth  order  Volterra 
kernels  are  zero,  for  7^2,  .  .  .  p,  [1],  [7],  as  in  (7). 

x(k)  =  =  ew*)] 

—  JT  +  Qp+\\x{K)\  "b  "b  •••  “b  Cp^[y(^)] 

pN 

=  X  +  ^ 

j^p+\ 

We  first  consider  a  5rd  order  Volterra  inverse.  Using  (7),  it  is 
possible  to  derive  the  expressions  for  the  Volterra  inverse  operators: 

Gy  =  ,  G2  =  -GyHfiy  ,  (8) 


G3  =  G,[-/72  +  +  GyH2\-HfiyH^-H^]Gy  (9) 

Fig.  3  gives  a  block  diagram  of  the  Jrd  order  Volterra  inverse,  and  G3 
is  shown  in  Fig.  4. 


Fig.  3:  Jrd  order  Volterra  inverse 


Fig.  4:  G3  of  the  Jrd  order  Volterra  inverse 


We  now  consider  a  Jth  order  Volterra  inverse.  It  is  possible  to 
obtain  the  7st,  2nd,  Jrd,  4th  and  Jth  order  Volterra  inverse  operators 
[8].  However,  for  the  purpose  of  this  paper  we  will  consider  the  case 
of  a  Jth  order  Volterra  model  with  only  odd  order  terms,  such  as  the 
oneweconsideredin[2].NotethatG2  =  G4  =  0,  and  the  remaining 
Volterra  inverse  operators  are: 

Gy  =  Hy^  ,  G3  =  —GyH<^Gy  ,  (10) 

G5  =  Gi[~.7^5  -H^[\  +  G,/73]  -3//3  +  Q.5H^GyH^  +  0.5A/3[2  +  GiW3]]Gi 

(11) 

Fig.  5  gives  a  block  diagram  of  the  Jth  order  Volterra  inverse  with 
only  odd  order  terms,  and  the  Jth  order  operator  G5  is  shown  in 
Fig.  6. 


Fig,  6:  G5  of  the  Jth  order  inverse  of  a  Volterra 
system  with  only  odd  order  kernels 


3.2  Adaptive  Volterra  inverse 

It  is  also  possible  to  obtain  a  Volterra  inverse  compensator  by 
using  an  adaptive  approach  illustrated  in  Fig.  7.  The  inverse  is 
obtained  by  minimising  the  mean  square  error  between  the  ideal 
output  x\k]  and  the  compensated  output  x[k],  as  in  (12). 


Fig.  7:  Volterra  inverse  by  adaptive  method 


=  £((4^]-i[*])^)  (12) 

The  adaptive  method  of  obtaining  a  Volterra  inverse  compensator 
may  appear  to  be  more  straightforward  than  deriving  an  analytical  pth 

order  Volterrainverse.Afterall,thekemelsfortheVolterramv^r5gare 

estimated  without  the  need  for  obtaining  a  Volterra  model  of  the 
original  system.  However,  this  method  would  be  more  difficult  to 
apply  in  practice.  Setting  the  nonlinearity  order  and  memory  of  the 
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adaptive  Volterra  inverse  requires  some  prior  information  about  the 
original  system.  The  significant  nonlinearity  orders  of  the  system 
would  determine  the  nonlinearity  order  of  the  adaptive  inverse. 
However,  determining  the  memory  requirement  of  the  inverse 
Volterra  kernels  is  not  trivial,  even  if  the  Volterra  kernel  memories  of 
the  original  system  are  known.  The  memory  of  the  inverse  Volterra 
kernels  will  usually  be  higher  than  that  of  the  Volterra  kernels  of  the 
original  system.  The  memory  lengths  of  the  kernels  of  the  derived 
Volterra  inverse  given  in  Section  3. 1  provide  a  method  of  detemining 
the  required  memory  of  the  kernels  for  the  adaptive  Volterra  inverse. 
This  will  be  discused  in  Section  4.2. 

4.  Computational  complexity  of  analytical  and 
adaptive  Volterra  inverses 

We  will  now  determine  the  computational  complexity  of  the 
Volterra  inverses.  We  restrict  our  discussion  to  two  cases:  one 
involving  a  5rd  order  Volterra  system,  the  other  involving  a  5th  order 
Volterra  system  with  only  odd  order  terms.  For  both  cases,  all  Volterra 
system  kernels  are  assumed  to  have  the  same  memory  span,  M. 

4.1  Complexity  of  analytical  Volterra  inverse 

First  we  consider  the  complexity  of  a  5rd  order  analytical  Volterra 
inverse.  We  assume  that  an  HR  filter  is  used  to  implement  the  first 
order  inverse  operator  G,  =  ^7*.  resulting  in  a  memory  span  of  M , 
the  same  as  that  of  //j.  In  any  case,  Gj  does  not  contribute 
significantly  to  the  overall  computational  complexity  of  a  5rd  order 
Volterra  inverse.  From  (8),  we  have  and  the 

computational  complexity  contributions  of  G^  are  summarised  in 
Table  1. 


components 
of  inverse 
operator  G2 

memory 

span 

nonlinearity 

order 

complexity 

number  of 
components 

Gi 

M 

1 

M 

2 

Hz 

M 

2 

(Af+1)! 

1 

(Af-D! 

Scaling 

coefficients 

1 

1 

Table  1 :  Computational  complexity  contributions  of  Gz 

From  (9)  and  Fig.  4,  we  can  summarise  the  computational  complexity 
contributions  of  Gj  in  Table  2. 


components 
of  inverse 
operator  G3 

memory 

span 

nonlinearity 

order 

complexity 

number  of 
components 

Gi 

M 

1 

M 

3 

Hz 

M 

2 

(Af+1)! 

(Af-D! 

4 

Hz 

M 

3 

(Af+2)! 

2(Af-l)! 

1 

Scaling 

coefficients 

- 

1 

2 

Table  2:  Computational  complexity  contributions  of  G3 


By  summing  all  contributions  of  Table  1  and  Table  2,  we  can  obtain 
the  total  complexity  of  the  5rd  order  analytical  inverse,  ea„(3,  M)\ 


CanO,  Af)=  3 +6M  + 


,(M+  1)!  ,  (M  +  2)! 
^  (M-1)!  2(Af-l)! 


(13) 


Next  we  consider  the  complexity  of  the  5th  order  analytical 
Volterra  inverse  with  only  odd  order  terms.  By  considering  all 
computational  complexity  contributions  from  each  of  the 
components  shown  in  Fig.  5 ,  it  can  be  shown  that  the  complexity  of 
the  5th  order  analytical  inverse  with  only  odd  order  terms  is  given  by 
e„„(5.  Af): 


eUS,  M)=  7 +  6M  + 


(M  +  2)! 
2(M-1)! 


.  (.M  +  4)1 
^  4!(M-1)! 


(14) 


4.2  Complexity  of  adaptive  Volterra  inverse 

NowconsideranadaptiveVolterrainverse.Sufficientmemoryhas 

to  be  set  for  the  measurement  of  all  inverse  Volterra  kernels.  In  order 
to  determine  the  required  memory  span  of  each  adaptive  Volterra 
inverse  kernel,  it  is  necessary  to  make  use  of  analytical  Volterra 
inverse  operators.  In  general,  the  adaptive  Volterra  inverse  (which 
minimises  the  mean  square  error  at  the  output),  would  not  be  directly 
equivalent  to,  or  give  the  same  compensation  performance  as  thepth 
order  analytical  Volterra  inverse  (which  is  designed  to  remove  the 
nonlinear  distortion  terms  up  to  pth  order).  However,  the  analytical 
Volterra  inverse  operators  provide  a  method  of  determining  the 
memory  requirements  of  the  adaptive  Volterra  inverse  operators. 

First  we  consider  the  Jrdorder  Volterra  system  with  all  orders  7st, 
2nd  and  5id,  having  a  memory  span  Af.  Again  we  assume  that  the 
memory  span  for  Gj  is  Af.  To  determine  the  memory  requirement  of 
the  2nd  and  5id  order  Volterra  kernels,  we  make  use  of  the  analytical 
expressions  for  of  G2  and  G3  given  in  (8)  and  (9).  Since  the  memory 
span  of  G,  and  is  Af,  the  memory  requirement  of  Gj  is 
(2(Af-l)  +1).  Since  the  memory  span  of  G,,  H2,  and  H3  is  Af,  the 
memory  requirement  of  G3  is  (4(Af-l)  +1).  Using  these  memory 
spans  as  a  guide  to  the  memory  requirement  of  the  adaptive  Volterra 
inverse  operators,  it  can  be  shown  that  the  computational  complexity 
of  the  5rd  order  adaptive  inverse  6^/3,  Af),  is  given  by: 


Qa/?’  Af)  = 


Af  + 


(2Af)! 

(2Af-2)! 


(4Af-l)! 

2(4Af-4)! 


(15) 


Now  we  consider  an  adaptive  5th  order  Volterra  inverse  with  only 
odd  order  terms.  Using  the  kernel  memory  length  of  the  Volterra 
model,  and  the  analytical  Volterra  inverseoperators  G3and  G,,  given 
in  (10)  and  (11),  we  can  determine  the  memory  requirement  of  each 
of  the  adaptive  Volterra  inverse  kernels.  We  have  G3  =  -G1//3G,. 
Since  the  memory  span  of  Gi  and  is  Af,  the  overall  memory 
requirement  of  G3is(2(Af-l)  +  1).  Now  consider  the  expression  for 
G,.  Since  the  memory  span  of  Gj,  and  /fj  is  Af,  the  memory 
requirement  of  G,  is  (4(Af-l)  +  1).  Again,  using  these  memory 
spans  as  a  guide  to  the  memory  requirement  of  the  adaptive  Volterra 
inverse  operators,  it  can  be  shown  that  the  computational  complexity 
of  the  5th  order  adaptive  inverse  Af),  is  given  by: 


e,/.5,  Af)  =  Af  + 


(2Af  +  1)! 
2{2M-2)\ 


+ 


(4Af  +  1)! 
4!(4Af-4)! 


(16) 
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The  computational  complexities  of  the  adaptive  inverses  are 
compared  with  those  of  the  analytical  (derived)  inverses  for  varying 
values  of  M.  The  ird  order  case  is  shown  in  Fig.  8,  and  the  5th  order 
case  is  shown  in  Fig.  9.  As  can  be  seen,  the  analytical  inverse  has  a 
much  lower  computational  complexity  than  the  corresponding 
adaptive  inverse.  This  may  not  be  surprising  since  the  analytical 
inverse  i  s  derived  using  the  actual  Volterra  model  of  the  system  which 
would  be  obtained  by  estimating  a  set  of  Volterra  kernels.  The 
adaptive  inverse  on  the  other  hand,  is  implemented  using  less  prior 
information  about  the  system,  resulting  in  a  more  general  Volterra 
inverse,  and  a  correspondingly  higher  computational  complexity.  We 
can  also  compare  the  complexity  of  the  Volterra  inverses  with  that  of 
the  corresponding  Volterra  models  shown  in  Fig.  1 .  It  can  be  shown 
that  the  computational  complexity  of  the  analytical  Volterra  inverse 
is  of  the  same  order  of  magnitude  as  the  corresponding  Volterra 
model,  while  the  adaptive  Volterra  inverse  has  much  higher 
complexity. 


5.  Conclusion 

In  this  paper  we  have  presented  the  architectures  for  ird  and  5th 
order  analytical  Volterra  inverses.  We  showed  how  a  Volterra  inverse 
can  also  be  obtained  by  an  adaptive  approach.  We  explained  how  the 
analytical  Volterra  inverse  may  be  used  to  determine  the  memory 
requirements  of  the  adaptive  Volterra  inverse  kernels.  The 
computational  complexity  of  the  two  types  of  Volterra  inverse 
(analytical  and  adaptive)  was  examined.  It  was  shown  that  the 
analytical  Volterra  inverse  gives  much  lower  computational 
complexity  than  its  adaptive  counterpart.  The  computational 
complexity  of  the  analytical  Volterra  inverse  is  of  the  same  order  of 
magnitude  as  the  Volterra  model,  while  the  more  general,  adaptive 
Volterra  inverse  has  much  higher  complexity  than  the  Volterra  model 
being  compensated.  Processor  technology  governs  the  operation  rate, 
so  for  high  sampling  rates,  real  time  compensation  using  a  Volterra 
inverse  is  limited  to  low  order,  short  memory  cases.  Using  a  Volterra 
model  of  the  system  to  derive  an  analytical  pih  order  Volterra  inverse 
would  give  implementation  advantages  over  the  use  of  the  more 
general,  higher-complexity,  adaptive  Volterra  inverse.  This  paper 
carried  out  comparisons  for  cases  involving  the  compensation  of 
Volterra  systems  for  which  all  kernels  are  assumed  to  have  the  same 
memory  span.  An  extension  to  this  work  would  consider  a  Volterra 
system  with  different  memory  spans  for  each  order.  Future  work  will 
investigate  the  use  of  more  efficient  implementation  and  computing 
structures  to  reduce  the  computational  burden  of  the  Volterra  inverse 
based  compensators. 
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Abstract 

Estimation  of  frequency  rate  of  linear  frequency 
modulated  signals  based  on  phase  angles  of  fourth  order 
sample  moments  is  considered.  Three  low- complexity 
estimators  are  proposed  whose  performance  is  close  to 
optimal,  that  is  their  error  variance  is  close  to  the 
Cramer-Rao  lower  bound. 


1.  Introduction 


An  important  signal  processing  problem  is  the  es¬ 
timation  of  the  parameters  of  complex-valued  linear 
frequency  modulated  signals  from  noisy  discrete  time 
observations.  Often,  the  frequency  rate  is  the  only  pa¬ 
rameter  of  interest.  In  this  paper,  a  novel  set  of  meth¬ 
ods  is  proposed  for  this  estimation  problem,  that  is 
methods  based  on  normalized  phase  angles  of  fourth 
order  sample  moments. 

An  extensive  review  of  diiferent  algorithms  for  this 
estimations  problem  is  given  in  [2].  Methods  that 
have  been  suggested  include  maximum  likelihood  es¬ 
timates,  [1],  estimates  utilizing  the  polynomial  phase 
transform,  [4],  and  a  Markov  based  estimator  derived 
from  the  phase  angles  of  the  sequence  where 

Zk  =  where  denotes  the 

observations,  [3].  For  the  estimator  proposed  in  [4] 
the  frequency  rate  is  estimated  by  the  spectral  posi¬ 
tion  of  the  highest  peak  of  the  magnitude  squared  dis¬ 
crete  time  ambiguity  function.  This  method  is  easily 
implemented  by  a  grid  search  of  the  periodogram  of 
the  sequence  {3/fc}fcLi+T  where  yk  —  Xk^k-r. 
choice  r  =  N/2  the  quotient  of  the  error  variance  di¬ 
vided  by  the  Cramer-Rao  lower  bound  (CRB)  tends 
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for  SNR-+  oo  to  16/15  «  1.07,  [4].  The  estimators 
proposed  here  require  no  numerical  search,  thus  they 
directly  provide  an  estimate  of  the  frequency  rate. 
Consider, 

=  Sfc  + 'Wfc  = 

where  A  is  a  complex- valued  amplitude,  and  the  noise 
Vk  is  zero  mean  complex-valued  white  Gaussian  with 
variance  cr^.  The  real  and  imaginary  parts  of  v*  are  in¬ 
dependent  with  variances  0-2/2,  respectively.  Further, 

4>k  =  2ir  (j^k  -I- 

where  /  G  (-1,  1)  is  the  normalized  frequency,  and  a  G 
(-0.5,  0.5)  is  the  frequency  rate.  The  parameters  (A, 
/,  a,  0-2)  are  all  unknown,  but  often  the  frequency  rate 
is  the  only  parameter  of  interest.  For  this  estimation 
problem,  the  CRB  is  given  by,  [3] 

^  SNR7r2iy(iy2  _  1)(JV2  _  4)  ^ 

where  a  denotes  the  estimated  frequency  rate  and 
where  the  SNR  is  defined  by  SNR  =  |A|2/o-2. 

A  set  of  frequency  rate  estimators  is  proposed  based 
on  normalized  phase  angles  of  fourth  order  sample  mo¬ 
ments,  that  is 


JV-1 

C{m)=Y,  m=l,...,-^  (4) 

jb=2m+l 

where,  for  simplicity,  N  is  assumed  to  be  odd.  Further, 

=  55^"  M  • 

In  (4)-(5),  *  denotes  complex  conjugate,  and  Z[-]  de¬ 
notes  the  phase  angle  (in  (— ir,  -x))  of  the  expression 
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between  the  brackets.  If  |a|  >  l/(2m^),  equation  (5) 
has  to  be  replaced  by  another  one  that  takes  into  the 
account  the  phase  unwrapping,  for  example 


#(m)  = 


27rm2 


m  =  1 


^(m)  =  <  m^4(l) 


^(m)]  +  2irV>(m)  m  =  2, . . . ,  J 

-  [^(”»)J 


27r 


(7) 


where  {•}^  denotes  the  round-off  operation  to  the  near¬ 
est  integer  applied  to  the  quantity  between  the  brack¬ 
ets. 

The  estimator  is  motivated  by  the  fact  that  for  a 
noiseless  signal  it  gives  the  correct  (true)  value  of 
the  frequency  rate,  because 


2.  Covariance  elements  of  $(m) 

First,  the  covariance  elements  of  4(m)  is  expressed 
in  terms  of  covariance  elements  of  C(m).  Let  6C{m)  = 
C(m)  —  C(m)  and 

^$(m)  =  $(m)  -  #(m) 

=  (‘"I 

The  last  approximation  in  (10)  is  valid  for  |5C(m)|  < 
1.  Using  the  assumptions  that  the  noise  is  circular 
white  Gaussian  one  can  find  that  E[C(m)]  =  C{m)  for 
m  >  0.  Next,  let 


C(m) 


N 

fc=2m+l 


(8) 


Gm,n  =  cov(C'(m),  C(n))  =  E[«C*(m)6C(n)]  (11) 
Hm.n  =  cov(C'*(m),  C'(n))  =  ^[6C{m)6C{n)]  (12) 
Rm,n  =  cov(#(m),  $(n))  w  E[tf$(m)5#(n)].  (13) 


Here,  C{m)  is  the  sample  moment  calculated  for  the 
noiseless  signal,  (vk  =0). 

In  this  paper,  frequency  rate  estimators  are  con¬ 
sidered  that  are  based  on  the  set  of  sample  mo- 
ments  {C'(m)}^^i  where  J  =  -  l)/2.  For 

the  unwrapped  sequence  {$(m)};(^_j  it  holds  that 
#(m)  =  a  +  e(m)  where  e(m)  is  a  zero  mean  col¬ 
ored  noise.  Therefore,  with  1  =  (1  •  •  •  1)^  and 
'9  =  (^(1)  •••  the  Markov  estimator  of  a 

is  given  by,  [5] 

“  ITR-Il  (9) 

where  R  is  the  (J|  J)-covariance  matrix  with  elements 
Rm,n  =cov($(m),#(n)). 

The  matrix  R  depends  on  SNR  as  well  as  on  iV' 
and  J .  A  full  expression  for  R  is  in  principle  possible 
to  derive.  In  this  paper,  however,  two  approximate 
expressions  of  R  are  derived  and  used  instead  of  R. 
One  expression  valid  for  high  SNR  (SNR/iV  >  1),  and 
one  valid  for  low  SNR  (JNT/SNR^  1).  The  motivation 
to  use  approximate  expressions  for  R  is  as  follows.  The 
SNR  is,  in  general,  unknown  and  has  to  be  estimated 
leading  to  a  multi  step  procedure  where  in  the  final 
step  the  frequency  rate  is  estimated  using  (9)  with  R 
replaced  by  an  estimate  R.  The  use  of  approximate 
expressions,  however,  give  estimators  independent  of 
SNR,  leading  to  a  direct  method  for  which  closed  form 
expressions  can  be  derived.  As  shown  in  the  sequel,  the 
performance  of  the  proposed  methods  is  (very)  close  to 
the  CRB. 


Using  (10)  it  follows  that 


_ I _ E  [-  _ 

\  C(m) 

xi 

2i  \  C{n)  C*{n)  J\ 

— i— ^Re  /  ^rn,n 

\  C*{m)C{n) 


6C*{Tn)\ 

c^(m) ; 


(14) 

1 

C{m)C{n)  J 


where  the  identity  Im  {2}  =:  (2  —  2*)/(2z)  was  used. 
Next,  note  the  dependence  of  Gm,n,  H^,n  and  R^.n 
on  the  SNR.  Following  the  reasoning  of  [6],  it  can  be 
seen  that 


4 


k=l 

4 

(15) 

H.„.n  = 

k=l 

4 

(16) 

Rm,n  =  ^WfcSNR-*. 

(17) 

k=i 


For  simplicity,  we  restrict  ourselves  to  calculating  the 
terms  proportional  to  SNR"^  and  SNR”^.  Thus,  we 
obtain  two  approximations  to  the  true  covariance  ele¬ 
ments:  the  former  approximation  is  valid  for  high  SNR 
scenarios,  and  the  latter  one  is  good  for  low  SNR  sce¬ 
narios.  Indeed,  in  the  latter  case  the  sample  size  has 
to  be  considerably  large  in  order  to  achieve  reasonable 
estimates. 
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The  decomposition  of  the  covariance  elements  in 
(15)-(17)  corresponds  to  the  decomposition 


6C(m)  =  ^ 


(18) 


Jb  =  l 


where  consists  of  the  terms  where  the  noise 

appears  in  the  k-th  power,  namely 

N 

Jb=2m+1 

N 

6^^^C{m)  =:  Y2  '^kvlvlvk^2m-  (20) 

jb=:2m+l 

A  straightforward  calculation  gives 

N  N 

k=2m+ll=2n+l 

y-{n‘l>l-nVUnn-2n)] 

=  2{N  -  2m)<T®5m„  (21) 

/i„„4SNR“^  =  E[tf(^)C'(Tn)5(^^C'(Ti)] 

JV  N 

=  S  ^['’kK-m‘>’k-m‘"k-2rr, 

Jb=2m+l/=2n+l 

>^n'0t-n'<’l-n'0t-2n]  =  0  (22) 

4  _  1  Smn4SNR"^ 


r„„4SNR-’  = 


STT^m^  |C(m)P 


«mnSNR 


-4 


47r^  m^(iV'  —  2m) 


m,  n  =  1,. . .  ,J.(23) 


In  order  to  evaluate  coefficient  Tmni  introduce  a  nor¬ 
malized  noise  £k  =  n/sk^  Properties  of  e*  read 
E[^m^n]  =  <r^/\A\^6mn>  E[£rmen]  =  0.  Using  this  nota¬ 
tion,  a  combination  of  (8)  and  (19)  gives 


i(^)C(m)  =  Y2  +  ^fc-2m  (24) 

Jb=2m-fl 

=  E[5(^)C'*(m)«(^)C(n)]SNR 

C*{m)  C{n)  f. 

=  E  E  ^  + 

fc=2m+l  ^=2n+l 

+6jb-2m/  +  h-2m,l-2n  +  46fc  “mjZ—n]  (25) 
=  E[5(')C'(m)6{')C'(n)]SNR 


j\r 


=  2 


iV'  2m  N  —  2n 


i,l—n 


+^ib— 2m,/-n  "i"  +  h  -m,/-2n  ]  (26) 


^^mnl  — 


STT^m^n^ 


^mnl 


^mnl 


[C*(m)C(n)  C{m)C{n) 


•  (27) 


(28) 


For  m,  n  <  JV/4  it  holds  that 

max(0,  min(m,  n)  —  |m  —  n|) 
^  27r2m2n2(JV  -  2m){N  -  2n) ' 

3.  Three  Estimators 


The  low  SNR  estimator 

For  large  N  and  low  SNR,  it  follows  from  (23)  that  R 
is  approximately  diagonal,  given  by 


R 


4x2  SNR 


^di.g(l,  i,...  i).  (29) 


Inserting  (29)  into  (9)  gives 

.  30ELi  k^Hk) 


J{J  +  1){2J  +  1)(3J2  +  3J  -  1) 


(30) 


The  high  SNR  estimator 

For  high  SNR,  the  elements  of  R  are  approximately 
equal  to  rm„lSNR"^  where  rmni  is  given  by  (27). 
Note  that  both  this  and  the  above  estimator  assign  the 
largest  weight  to  $(J).  This  fact  motivates  the  simple 
estimator  proposed  below. 


A  simple  estimator 
Consider  the  simple  estimator 

d  =  4(J) 


(31) 


where  #(J)  is  calculated,  for  example,  using  (6)- 
(7).  For  J  <  iV/4,  the  asymptotic  variance  of  oc  for 
SNR/iV  >  1  directly  follows  from  (27) 


var[d]  = 


rjji 


SNR  27r2SNRJ3(iV  ~  2Jy 


(32) 


Jb=2m+l/=2n+l 


The  variance  (32)  is  analytically  minimized  for  J  = 
3JV'/10,  for  which  the  quotient  of  the  error  variance  di¬ 
vided  by  the  CRB  is  115/90  1.27. 

Robust  phase  unwrapping 

A  significant  improvement  of  the  algorithm  perfor¬ 
mance  at  low  SNR  scenarios  can  be  achieved  using 
an  alternative  robust  unwrapping  scheme  in  (6)  and 
(7),  that  proceeds  recursively:  In  (7),  $(1)  is  replaced 
with  ^rrii  which  is  an  average  of  the  previous  estimates 
{^W}T=i‘  quantity  #Tn  is  calculated  as  the  arit- 
metic  mean  of  {^(fc)}j^i^»  for  Ai  >  4  with  the  addition 
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SNR=6c!B  SNR=30dB 


Figure  i.  The  quotient  of  the  error  variance  di¬ 
vided  by  the  CRB  as  a  function  of  the  design  vari¬ 
able  J . 


Figure  2.  The  quotient  of  the  error  variance  di¬ 
vided  by  the  CRB  as  a  function  of  SNR, 


that  the  largest  and  the  smallest  terms  are  excluded. 
Numerical  Examples 

1000  realizations,  each  of  length  N  =  31,  are  generated 
for  the  model  (1)  with  A  =  where  is  uniformly 
distributed  in  [0,27r],  /  =  0.3,  and  a  =  0.1.  The  per¬ 
formance  for  SNR=6dB  and  SNR=30dB  is  studied  in 
Fig.  1,  where  the  efficiency  in  terms  of  the  quotient  of 
the  error  variance  divided  by  the  CRB  is  shown  versus 
J. 

Prom  the  curves  in  Fig.  1  one  can  note  that  for  low 
(high)  SNRs  the  “low  SNR”  (“high  SNR”)  estimator 
outperforms  both  the  other  methods  in  terms  of  ef¬ 
ficiency.  As  expected  from  (32),  the  performance  of 
the  “simple”  estimator  is  inferior  compared  with  the 
efficiency  of  the  other  methods.  For  SNR=30dB  the 


empirical  efficiency  is  close  to  the  predicted  theoretical 
result  rjji/SNR.  In  general,  the  curves  in  Fig.  1  have 
two  minima  with  respect  to  J. 

The  results  from  a  similar  experiment  with  perfor¬ 
mance  versus  SNR  are  given  in  Fig.  2.  Here,  J  =  14  for 
the  “low  SNR”  estimator,  J  —  12  for  the  “high  SNR” 
estimator,  and  J  =  8  for  the  “simple”  estimator.  Also, 
the  frequency  rate  estimators  in  [3]  and  [4]  are  consid¬ 
ered.  The  SNR  threshold  for  the  three  estimators  is 
4dB  below  the  threshold  for  the  estimator  in  [3],  how¬ 
ever  the  latter  estimator  marginally  performs  better  in 
the  SNR  range  between  lOdB  and  20dB.  For  the  esti¬ 
mator  in  [3]  the  threshold  is  sharp  at  lOdB,  while  for 
the  estimators  proposed  in  this  paper  there  is  a  grace¬ 
ful  degradation  in  performance  for  low  SNRs  above  the 
threshold  at  6dB.  The  SNR  threshold  for  the  method  in 
[4]  is  4dB  and  thus  this  method  has  the  lowest  thresh¬ 
old  among  the  considered  estimators.  For  high  SNR 
its  efficiency  is  close  to  the  predicted  theoretical  result, 
that  is  slightly  inferior  performance  compared  to  the 
“high  SNR”  estimator  and  the  estimator  in  [3]. 

4.  Conclusions 

Three  algorithms  for  estimating  the  frequency  rate 
of  a  noisy  complex- valued  linear  FM  signal  have  been 
proposed,  and  their  performances  have  been  character¬ 
ized.  The  methods  rely  on  phase  angle  calculations  of 
forth  order  sample  moments  of  the  noisy  signal.  It  has 
been  demonstrated  that  the  performance  of  the  pro¬ 
posed  methods  is  nearly  optimal  for  proper  choices  of 
the  design  variable  J.  The  proposed  algorithms  re¬ 
quire  no  numerical  search,  and  can  achieve  lower  SNR 
threshold  than  the  algorithm  in  [3]. 
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Abstract 

The  work  is  devoted  to  the  mathematical  theory  of 
the  ''Caterpillar''  method  which  has  proved  to  be  a  very 
powerful  tool  of  analysis  of  time  series.  This  method 
is  based  on  the  use  of  the  principal  component  analy¬ 
sis  technique  applied  to  a  multivariate  sample  which  is 
obtained  from  the  initial  sample  by  the  method  of  de¬ 
lays,  A  natural  language  used  to  analyse  the  method 
is  the  Hilbert-Schmidt  operator  theory.  We  give  condi¬ 
tions  when  two  deterministic  functions  are  completely 
separated  from  each  other  for  a  finite  period  of  obser¬ 
vations,  We  also  show  that  under  mild  conditions  any 
deterministic  function  can  be  asymptotically  separated 
from  any  ergodic  random  noise. 


!•  Introduction 

The  work  is  devoted  to  the  mathematical  theory  of 
a  method  of  time  series  analysis.  A  natural  language 
used  to  analyse  the  method  is  the  Hilbert-Schmidt  op¬ 
erator  theory. 

Let  /  be  a  function  on  [0,t],  We  assume  that  this 
function  belongs  to  a  Hilbert  space  H  and  is  a  realiza¬ 
tion  of  some,  perhaps  random,  process,  characteristics 
of  which  are  unknown.  We  also  assume  that  /  can  be 
represented  as  a  sum  of  several  functions  /,•  with  ev¬ 
ery  function  related  to  a  certain  effect  or  to  the  noise 
component.  We  thus  seek  for  an  expansion 

/=Ea.  (>) 

t 

where  the  terms  /*  are  “interpret able”  and  “indepen¬ 
dent”. 

We  do  not  assume  any  parametric  model  for  /  and 
therefore  the  regression  analysis  technique  can  not  be 


used  to  get  (1).  “Independence”  of  /i  in  (1)  can  some¬ 
times  be  achieved  through  an  expansion  of  /  with  re¬ 
spect  to  an  orthogonal  basis,  but  the  selection  problem 
of  the  basis  presents  big  difficulty  and  could  not  be 
uniquely  resolved. 

The  main  essence  of  the  present  approach  is  that 
the  functions  /,•  are  constructed  through  /  itself.  More 
precisely,  in  the  case  of  discrete  time,  /,■  are  related  to 
principal  components  of  a  multivariate  sample  gener¬ 
ated  from  /  by  introducing  the  lag,  or  delay,  variable. 

2,  Description  of  the  Principal  Scheme 

Let  be  a  numerical  sequence,  or  time  series, 

and  let  r,  1  <  r  <  A,  be  an  integer.  Define  a  collection 
of  r-dimensional  vectors  , . . . ,  n  =  A  —  r-|- 1, 
by  the  formula  where  =  /fc+t-ij 


and  define  the  matrix 

h 

x  =  = 

h 

h 

/n-hl 

\fr 

/r+l 

/iV  / 

This  matrix  will  be  called  the  noncentered  matrix  of 
delay  observations.  Define  the  mean  vector  X  = 
(xi,X2, . .  .,Xr)  ,  where 

1  ”  . 

”  »=l 

Subtracting  this  vector  from  each  of  , . . . ,  we 
get  the  matrix  Y  of  centered  vectors 

Consider  now  the  covariance  matrix  of  the  vec¬ 
tors  considered  as  a  n-sample  of  r- 

dimensional  vectors  and  apply  the  principal  component 
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method  to  this  sample.  Let 


Vx  =  . . 

II 

- 

(2) 

v\  ^  . 

■ 

(^) 

.  .  V2 

1 

(2) 

Vt  . 

.*  # 

be  the  matrix  of  eigen-vectors  of  the  covariance  matrix 
of 

The  standard  for  the  principal  component  analysis 
operations  of  computing  principal  components: 

Ux  =  ylY  =  {Uu^^.^Urf 


interval  [0,  into  a  rectangular  tt  and  “transfer”  /  from 
[0,  <]  to  TT .  Formally  this  operation  can  be  defined  with 
the  help  of  a  mapping  9  :it  and  consideration  of 

the  two- variate  function  g  =  f  o6  instead  of  /.  In  the 
standard  variant  of  the  additive  caterpillar,  we  have 
=  X  -f  5,  (x,5)  6  TT  =  [0,r]  X  [0,^  -  r]  and 
g{x,s)  =  /(x-hs). 

Function  g  can  often  be  considered  as  the  kernel  of 
an  integral  operator  which  happens  to  be  the  Hilbert- 
Schmidt  operator  and  possesses  therefore  a  number  of 
attractive  features.  In  particular  g  can  be  expanded 
with  respect  to  two  orthogonal  sequences  of  the  base 
functions. 


and  reconstruction  of  the  initial  (centered)  sample 
based  on  a  selected  number  r  of  principal  components: 


Y  =  {v 


(u) 


y(»r)) 


/ul\ 


where  0  <  zi  <  22  <  - . .  <  zV  ^7",  can  be  applied  as 
usual.  After  reconstruction  of  the  matrix  X  =  Y  + 
X,  where  X  -  is  a  matrix  with  columns  equals  X,  the 
initial  sequence  is  reconstructed  by  averaging  over  the 
diagonals  of  X: 


s 


1  <  5  <  r , 

1  =  1 

r 

r  <  s  <  n , 

i~l 

AT-s  +  l  JZ  ^i+5-n,n“j  +  l 

n  <  s  <  N 

Thus,  we  have  presented  a  method  of  analysis  of 
time  series  which  we  call  “caterpillar” ,  The  first  refer¬ 
ence  to  this  notation  and  also  the  first  numerical  ex¬ 
amples  comes  back  to  [1],  Note,  that  similar  method 
was  investigated  from  the  geometrical  point  of  view  in 
[2].  Our  assumption  differ  from  the  mentioned  above 
by  the  successive  application  of  the  methods  of  func¬ 
tional  analysis.  We  have  applied  this  method  to  many 
practical  problems  and  the  method  proved  to  be  very 
powerful  in  analysis  of  time  series,  particularly  nonsta¬ 
tionary  and  short,  with  the  value  of  N  starting  at  20. 
(It  is  well  known  that  such  time  series  are  hard  to  anal¬ 
yse.)  The  method  has  been  generalized  to  multivariate 
time  series  and  random  fields.  In  the  next  section  we 
present  some  results  concerning  the  theoretical  proper¬ 
ties  of  the  method. 


9  =  '^'l’n®<Pn  (2) 

n 

Selecting  several  terms  in  the  expansion  and  project¬ 
ing  the  two-dimensional  function  back  to  [0,i]  we  thus 
get  one  or  several  terms  in  (1)  which  should  then  be 
interpreted  and  analysed. 

Let  us  introduce  one  notation.  Functions  /i  and 
/2  are  separeted  in  bi-ortogonal  expansion  (2)  if  the 
corresponding  fields  and  g^^'^  are  satisfying  to  the 
following  conditions 

J  9^^Hx,s)g^^'>{y,s)ds  -  0, 

for  almost  all  (x,  y),  and 

J  g^^^x,u)g^^\x,v)dx  =  0, 

for  almost  all  (u,  v). 

We  formulate  different  features  of  the  method,  con¬ 
sidering  for  simplicity  only  the  additive  case  6{x,  s)  = 
x-hs,  (x,s)  G  7r=  [0,r]  X  [0,i-r]. 

When  fi  and  /2  are  continuous  functions  then  the 
separability  condition  under  mild  conditions  for  func¬ 
tions  fi  and  /2  follows  from  the  following  two  equali¬ 
ties: 

fi{S  z)f2{z)  dz  =  0 
for  any  5  —  r], 

/i(»  +  2/)/2(2/)  dy-0 

for  every  a  G  [0,  r],  and  two  ” periodicity  conditions”: 


3,  Some  Results  of  Theoretical  Study 

Let  us  describe  a  method  in  the  case  of  the  functions 
of  continuous  argument.  We  first  need  to  transform  the 


/i(^  +  r  +  v)f2{T  -\^v)  =  fi{S  +  v)f2{v) , 
for  all  V  G  [0, t  —  r]  and  S  G  [—v  ^  r  —  v] . 

fi{a  -h  ^  -  r  +  y)f2{t  -  r  +  y)  =  /i(a  -f-  y)/2(y) 
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for  all  y  €  [0,  r]  and  a  £  [-y,  r-y]. 

Assume  now  that  (f2,^,P)  be  some  probability 
space,  /i(s) ,  /2(s)  be  two  random  processes  defined  for 
s  >  0 .  Define  f^fi+h  and  letting  r  =  r(f)  £  (0,t) 
define  the  random  fields  g{x,  y)  =  f{x+y) ,  y)  = 

/i(x  +  y)  and  <;(2)(x,y)  =  f2{x  +  y)-  Let  us  consider 

the  noncentered  correlation  coefficients  p[  ^  and  be¬ 
tween  the  random  fields  g^^^  and  g^^^ .  We  shall  say  that 
random  processes  fi  and  /2  are  stochastically  separa¬ 
ble  when  <  oo  if  the  correlation  coefficients  («,  v) 
and  p^Hx,  y)  converge  in  probability  to  0  when  f  ->■  oo 
for  almost  all  (u,  v)  and  (x,y) . 

Let  now  /(s) ,  s  ^  0  be  some  deterministic  function 
from  L2[[o,oo)]  an(r^(s)  be  a  random  process  inter¬ 
preted  as  a  ’’pure  noise”.  Let  E^(s)  =  0  for  any  s  and 
R^{x,  y)  be  covariance  function  of  ^(s).  Assume  (i)  for 
any  s  €  [0,  oo)  there  exists  <5  =  ^(s)  >0  such  that 

P(f  (^{x->r  s)dxlT  <S) 

Jo 

for  T  ->■  oo;  and  (ii)  for  any  u,v  £  R 
T  r'T' 

^  [  dx  I  dy fT{x,u)fT{y,u)R({x+v,y+v)-^0  (3) 

Jo  Jo 

when  T  oo,  where 

Mx,  s)  =  fix  d-  Piv  +  s)  dy . 

The  following  theorem  holds. 

Theorem.  Ifr->oo,f-r->-oo,  and  the  condi¬ 
tions  (i)  and  (ii)  hold  then  /  and  ^  are  stochastically 
separable  when  <  — >  oo  . 

Note  that  (i)  and  (ii)  hold  if  the  process  ^(s)  is  sta¬ 
tionary, 

Lj^e{s)ds^RiiO)>(^ 

when  T  — )>  cx» , 

^  dx  dy\R(ix-y)\^0, 

and  /  is  bounded  and  satisfies 

liminf  f  f^{x  +  s)  dar  >  0. 

T-¥oo  T  Jq 

If  /  is  unbounded  then  the  sufficient  conditions  be¬ 
come  different.  For  example,  if  /  is  linear  then  (3)  can 
be  rewritten  as 

^  dx  j  dy(a:+l)(y+l)l-R«(!>^-y)l  =  o(^^)> 


T  oo. 

Theoretical  results  are  in  a  very  good  agreement 
with  the  numerical  results:  when  the  number  of  ob¬ 
servations  is  large  then  simple  trend  functions,  like  ex¬ 
ponential  and  trigonometric  functions,  can  usually  be 
explicitly  seen  in  the  first  components  of  (1). 


4  Multiplicative  ’’Caterpillar” 


In  this  section  we  consider  the  multiplicative  variant 
of  the  ’’Caterpillar”  method.  Although  this  case  does 
not  have  a  great  partical  value,  nevertheless  it  shows 
that  the  choice  of  the  function  0(a;,s)  differing  from 
X  +  s  can  also  lead  to  interesting  results. 

Let  /  be  given  on  the  interval  [— f,  t],  the  sets  D\  and 
D^t  have  the  form  Di  =  [r,  r] ,  D2  =  [—t/x,  t/r] ,  with 
r  e  (0,f)  and  0(x,s)  =  xs,  x  £  Di,  s  £  D2  ■  In  the 
noncentered  case,  that  is  when  Y  =  X,  y(x,  s)  =  /(xs). 

The  separation  conditions  can  be  written  as 

f  fi(xu)f2{xv)  dx  =  0  (4) 


for  any  u,v  £  [-f/r,f/r]  and 


ds  =  0 


for  any  X,  y  G  [-r,  r] . 

This  implies  that  if  one  of  functions  fi  or  /2  is  an 
even  function  and  another  is  an  odd  one,  then  these 
two  functions  are  always  separable.  It  is  also  possible 
to  demonstrate  that  the  above  condition  is  not  only 
sufficient,  but  also  necessary. 
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Abstract 

Hfe  pose  the  following  optimization:  Given  y  = 
{j/WlnrTo'  ^^^,findafinite-alphabetSc=  e 

A^,  that  minimizes  d{x,y)  +  g{x)  subject  to:  x  satires 
a  hard  structural  (syntactic)  constraint,  e.g.,  x  is  piecewise 
constant  of  plateau  run-length  >  M,  or  locally  monotonic 
oflomo  -degree  a.  Here,  d{x,y)  =  dj(y{n),x{n)) 

measures  fidelity  to  the  data,  and  is  fmown  as  the  noise 
term,  and  ^(x)  =  9n{^{p)’,x{n  —  1))  measures 

smoothness-complexity  of  the  solution.  This  optimization 
represents  the  unification  and  outgrowth  of  several  digital 
nonlinear  filtering  schemes,  including,  in  particular,  digi¬ 
tal  counterparts  4?/Weak  Continuity  (WC)  [6,  7,  2],  and 
Minimum  Description  Length  (MDL)  [4]  on  one  hand,  and 
nonlinear  regression,  e.g.,  VORCA  filtering  [11],  and  Dig¬ 
ital  Locally  Monotonic  Regression  [10],  on  the  other.  It 
is  shown  that  the  proposed  optimization  admits  efficient 
Viterbi-type  solution,  and,  in  terms  of  performance,  com¬ 
bines  the  best  of  both  worlds. 


1  Introduction 

One  of  the  classic  problems  in  the  true  spirit  of  non¬ 
linear  filtering  is  that  of  detecting  and  estimating  edges  in 
noise.  Among  the  great  many  approaches  proposed  so  far, 
a  particularly  noteworthy  one  is  fhe  (nonconvex)  variational 
Weak  Continuity  (WC)  paradigm  of  Mumford-Shah  [6,  7] 
and  Blake-Zisserman  [2]  (see  also  the  excellent  recent  book 
by  Morel  and  Solimini  [5]).  Weak  continuity  is  a  rigorous 
paradigm  for  edge  detection,  which  attempts  to  fit piecewise- 
smooth  candidate  “interpretations”  to  the  observable  data 
(thus  the  term  weak  continuity). 

In  real  life  we  nowadays  most  often  deal  with  digital 
data,  i.e.,  sequences  of  finite-alphabet  variables.  Following 
Blake  and  Zisserman  [2],  we  present  a  digital  version  of 
discrete-time  WC.  Given  a  (generally  real- valued)  sequence 
of  finite  extent  y  =  {2/(n)}^ro‘  £  R^,  the  problem  is  to 


find  a  finite-alphabet  sequence,  X  =  (the 

“reproduction  process”),  that  minimizes 

JV-l  N-l 

£  (2/(n)  -  x{n))^  -I-  ^2  ha,Xwc  -  x{n  -  1)) 

n=0  n=l 

where 


hoL.Xwc  W  ” 


a 


,  otherwise 


^wc 


There  exist  essentially  two  ways  to  go  about  solving  this 
problem:  Dynamic  Programming  (DP)  [1],  and  the  so-called 
Graduated  Non  Convexity  (GNC)  algorithm  [2].  For  one¬ 
dimensional  data,  DP  is  probably  the  best  way  to  go.  Ac¬ 
cording  to  Blake  and  Zisserman  [2],  Papoulias  [8]  was  the 
first  to  implement  a  DP  WC  algorithm.  The  drawback  of 
DP  is  that  it  does  not  generalize  in  higher  dimensions,  for 
lack  of  total  ordering.  The  GNC,  by  comparison,  carries 
over  quite  effortlessly  in  higher  dimensions. 

A  related  optimization  has  been  advocated  by  Leclerc  [4], 
based  on  the  Minimum  Description  Length  (MDL)  principle 
of  Rissanen;  the  goal  is  the  minimization  of: 

(^))  Xmdl  [l  -  <J(x(n)  -  x{n  -  1))] 

n=0  n—1 


where  S  is  the  Kronecker  delta  function,  and  is  noise 
variance.  Here,  Xmdl  >  0. 


2  Unification  and  Motivation 

Both  WC  and  MDL  seek  to  minimize  a  nonconvex  cost 
of  the  following  general  form 

N-l  N-l 

V(y»x)  =  dn{yin),x{n))  +  gnix{n),x{n  -  1)) 
n=0  n=l 

In  the  digital  world,  Leclerc ’s  MDL  formulation  is  a  spe¬ 
cial  case  of  WC.  Indeed,  if  Xwc  is  sufficiently  large 
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(i.e.,  \\^c  >  “)>  ^  integer,  ha,Xwci^)  = 

a  [l  -  5{t)] .  If,  in  addition,  a  =  XuDLCf^^  then  WC  re¬ 
duces  to  Leclerc’s  MDL  approach. 

Both  WC,  and  Leclerc’s  MDL  approach  are  powerful 
and  meritorious  paradigms;  however,  both  share  a  nontrivial 
shortcoming:  they  are  not  robust  with  respect  to  outliers,  in 
the  sense  of  being  susceptible  to  noise-induced  “impulses”. 
Consider  a  single  such  outlier,  i.e.,  a  Kronecker  delta  of 
height  A.  If  >  2Xmdl,  then  Leclerc’s  MDL  approach 

will  preserve  this  “impulse”;  similarly,  if  and 

A^  >  2a,  then  WC  will  also  preserve  it.  Observe  that 
these  statements  should  be  interpreted  as  follows:  for  each 
given  choice  of  respective  optimization  parameter(s),  one 
can  find  a  sufficiently  large  A  which  forces  both  “filters”  to 
preserve  “impulses”  of  height  >  A.  In  the  context  of  edge 
detection  in  impulsive  noise,  this  behavior  is  undesirable; 
these  “impulses”,  no  matter  how  powerful,  should  not  be 
preserved  [12]. 

“Traditional”  nonlinear  filters  (e.g.,  the  root  of  the  me¬ 
dian)  are  robust  with  respect  to  outliers.  This  robustness 
stems  firom  the  fact  that  the  implicit  goal  of  these  filters  is 
to  enforce  (albeit  suboptimally)  “hard”  structural  (syntactic) 
constraints  on  the  data,  e.g.,  of  the  type  x  is  piecewise  con¬ 
stant  of  plateau  run-length  >  M,  or  locally  monotonic  of 
lomo-degree  a.  How  to  optimally  enforce  such  constraints 
has  been  the  subject  of  previous  work  by  the  first  author  in 
so-called  VORCA  filtering  [11]  and  digital  locally  mono¬ 
tonic  regression  [10].  VORCA  filtering  amounts  to  solving: 

JV-l 

minimize  ^  dniy{n),x{n)) 
n=0 

subject  to  :  X  =  {a:(n)}^ro‘  €  Pm 

where  Pj^  is  the  set  of  all  sequences  of  N  elements  of  A 
which  are  piecewise  constant  of  plateau  (run)  length  >  M . 

A  real-valued  sequence  (string),  x,  of  length  N,  is  lo¬ 
cally  monotonic  of  degree  a  <  N  (or  lomo-a^  or  simply 
lomo  in  case  a  is  understood)  if  each  and  every  one  of  its 
substrings  of  a  consecutive  symbols  is  monotonic.  Local 
monotonicity  appears  in  the  study  of  the  set  of  root  signals  of 
the  median  filter  [3]  ;  it  constraints  the  roughness  of  a  signal 
by  limiting  the  rate  at  which  the  signal  undergoes  changes 
of  trend  (increasing  to  decreasing  or  vice  versa).  In  effect, 
it  limits  the  frequency  of  oscillationSy  without  limiting  the 
magnitude  of  jump  level  changes  that  the  signal  exhibits. 
Local  monotonicity  implies  a  different  notion  of  smooth¬ 
ness,  as  compared  to  e.g.,  limiting  the  support  of  the  Fourier 
transform;  the  latter  imposes  a  limit  on  both  the  frequency 
of  oscillations,  and  the  magnitude  of  jump  level  changes. 

In  [9],  Restrepo  and  Bovik  developed  an  elegant  mathe¬ 
matical  framework  in  which  they  studied  locally  monotonic 
regressions  in  R^.  Digital  locally  monotonic  regression 


has  been  proposed  in  [10],  and  it  amounts  to  solving: 

JV-l 

minimize  E"  niyin),x{n)) 

n—0 

subject  to  :  X  =  {a:(n)}^J’o*  £  A{a,N,A) 

where  A(a,  N,  A)  is  the  set  of  all  sequences  of  N  elements 
of  A  which  are  locally  monotonic  of  lomo-degree  a  [10]. 
Both  approaches  are  robust,  in  the  sense  of  suppressing 
impulse-like  inputs,  while  retaining  “true”  (consistent)  edge 
signals.  However,  both  do  not  take  into  account  the  signif¬ 
icance  of  level  changes  (“discontinuities”)  in  the  solution, 
i.e.,  they  may  declare  an  edge  even  when  the  two  result¬ 
ing  levels  are  very  close.  This  is  often  undesirable;  and  it 
happens  exactly  because  the  latter  two  approaches  do  not 
explicitly  account  for  smoothness/complexity,  i.e.,  unlike 
WC,  they  do  not  incorporate  a  “soft”  smoothness/complexity 
penalty  into  the  cost  function. 

3  Structurally  Robust  Weak  Continuity 

It  appears  quite  natural,  then,  to  combine  the  power 
of  WC  with  the  appeal  and  demonstrated  effectiveness  of 
“hard”  structural  constraints,  and  propose  the  minimization 
of: 

N-i  AT-l 

^2  <lniyin),x{n))  +  9n{x{n),x{n  -  1)) 

n=0 

subject  to  :  X  6  5 

where  S  is  the  set  of  all  sequences  of  N  elements  of  A  sat¬ 
isfying  some  local  “hard”  structural  constraint.  Here,  again, 
d(x,y)  =  ^n{yin),x{n))  is  a  fidelity  measure, 

and  c;(x)  =  fln(2:(n),a;(n  -  1))  is  a  smoothness- 

complexity  measure.  We  will  refer  to  this  optimization 
as  Structurally  Robust  Weak  Continuity  (SR-WC).  When 
S  =  Pm,  Runlength-Constrained  Weak  Continuity  (RC- 
WC)  results;  similarly,  if  <S  =  A{a,N,A),  then  Locally 
Monotonic  Weak  Continuity  (LM-WC)  results.  Both  retain 
the  unique  merits  of  WC,  are  robust  with  respect  to  out¬ 
liers,  take  complexity  into  consideration,  and  admit  efficient 
Viterbi-type  solution.  In  fact,  RC-WC,  and  LM-WC  can 
be  solved  using  exactly  the  same  resources  and  computa¬ 
tional  structures  as  VORCA,  and  digital  locally  monotonic 
regression,  respectively  [12].  The  extension  to  weak  conti¬ 
nuity  (i.e.,  the  incorporation  of  the  first-order  smoothness- 
complexity  measure  (/(x)  =  9n{x{n),x{n-l))  into 

the  cost  functional)  essentially  comes  “for  free”,  due  to 
the  structure  of  the  Viterbi  Algorithm.  The  resulting  com¬ 
plexity  of  RC-WC,  LM-WC  is  0((|Ap  -f-  1A|(M  -  l))N), 
0{\A\^aN),  respectively. 
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By  virtue  of  the  above,  efficient  computation  of  SR-WC 
can  be  taken  for  granted.  What  is  intriguing  and  unex¬ 
plored  is  how  to  go  about  choosing  fidelity  and  smooth¬ 
ness/complexity  measures.  We  know  that,  at  least  for  some 
specific  choices,  e.g.,  “classic”  WC,  MDL,  or  VORCA,  we 
may  expect  very  good  nonlinear  filtering  results.  The  ques¬ 
tion  is,  can  we  make  even  better  choices,  and  in  what  sense. 
This  is  partially  explored  in  the  following. 

4  Example 

This  particular  example  demonstrates  the  effectiveness  of 
simple  RC-WC.  Figure  1  depicts  a  typical  input  sequence. 
This  particular  input  has  been  generated  by  adding  i.i.d. 
noise  on  some  artificial  “true”  noise-free  test  data.  The 
noise  has  been  generated  according  to  a  mixture  of  a  uniform 
distribution  and  an  “outlier”  distribution,  the  mixture  being 
heavily  weighted  in  favor  of  the  uniform  distribution,  and 
most  of  the  data  points  are  contaminated.  It  should  be 
stressed  that  we  do  not  utilize  our  exact  knowledge  of  the 
noise  model  to  fully  match  the  optimization  to  the  noise 
characteristics,  which  is  certainly  a  possibility  [11,  10,  9]. 
Instead,  as  it  will  be  explained  shortly,  we  only  use  some 
crude  noise  measurements  to  help  us  pick  reasonable  values 
for  two  optimization  parameters.  The  noise-free  test  data 
has  not  been  reproduced  on  its  own,  due  to  space  limitations; 
instead,  it  has  been  overlaid  on  the  restoration  plots,  using  a 
dashed  line.  This  is  meant  to  help  the  reader  judge  filtering 
“quality”. 

Here  we  take  dn{y{n),x{n))  =  |2/(n)  ~  a:(n)|, 
Vn  €  {0,  1},  and  5n(a:(n),rc(n  -  1))  = 

^wc  [1  -<5(a:(n)  -a:(n-  1))],  Vn  6  {0, 1, •  • -.iV  -  1}, 
A  =  {0,---,99},N  =  5\2,  aiidS  =  Pj^. 

For  M  =  1,  we  obtain  “plain”  WC,  and  the  result  for 
=  25  is  depicted  in  Figure  2.  This  is  excellent  fil¬ 
tering,  yet  powerful  outliers  are  preserved.  We  could,  in 
principle,  further  increase  thereby  eventually  elimi¬ 
nating  outliers,  but,  at  the  same  time,  also  “mending”  true 
edges.  Clearly,  this  is  not  the  way  to  go  about  ameliorating 
this  problem,  for,  no  matter  what  our  choice  of  is,  one 
can  always  find  a  sufficiently  powerful  outlier  that  will  fool 
WC. 

For  A^^  =  0,  we  obtain  “plain”  VORCA,  and  the  result 
for  M  =  15  is  depicted  in  Figure  3.  This  too  is  excellent 
filtering,  the  outliers  have  been  effectively  eliminated,  yet 
some  undesirable  “weak”  edges  still  remain.  For  A^^  = 
25,  and  M  =  15  we  have  “true”  hybrid  RC-WC,  and  the 
result  is  depicted  in  Figure  4.  It  is  obvious  that  RC-WC 
combines  the  power  of  both  methods:  this  is,  indeed,  almost 
perfect  filtering. 

One  obvious  objection  may  be  anticipated:  one  may  won¬ 
der  about  how  we  came  up  with  the  particular  choices  of 
M,  A  that  led  to  these  results.  In  the  following,  we  address 


this  subject. 

4.1  Selection  of  Optimization  Parameters 

We  will  use  the  following  definitions,  ^feature  (outlying 
burst)  of  width  w  <  M  is  a,  “short”  arbitrary  deviation  from 
a  plateau,  consisting  of  a  total  of  w  perturbed  samples.  A 
constant  segment  of  saliency  (width- strength  product)  p  = 
w  •  H  is  a  (potentially  long)  equidistant  deviation  from  a 
plateau,  i.e.,  a  string  of  w  equal  samples  which  differ  by  H 
from  the  plateau  level. 

The  following  two  claims  refer  to  this  particular  in¬ 
stance  of  RC-WCy  i.e.,  dn(y(n),x(n))  =  \y{n)  —  z(n)|, 
Vn  €  {0,  l,--,iV- 1},  and  i?n(a:(n),x(n  --  1))  = 
^wc  [1  -^(a;(n)  -x(n-  1))],  Vn  e  {0, 1,  •  •  •  ,iV  -  1}. 
Proofs  can  be  found  in  [12]. 

Theorem  1  Assume  that  M  is  odd,  RC-WC  eliminates  all 
features  (outlying  bursts)  of  width  w  <  regardless 

of  true  for  A^^  =  0,  i,e„  “plain** 

VORCA  filtering  with  respect  to  the  above  choice  ofdn  (*,  *)• 

Theorem  2  RC-WC  suppresses  all  constant  segments  of 
saliency  (width-strength  product)  p  —  w  •  H  <  2A|^^^, 
Le.,  “mends**  the  “weak**  edges  at  the  endpoints  of  such 
segments,  and  the  same  holds  for  M  =  1,  Le,,  “plain**  WC 
with  respect  to  the  above  choice  ofdn{-,  •)»  gn{'i  *)• 

The  overall  conclusion  is  that  this  particular  instance  of 
RC-WC  suppresses  features  of  either  (i)  width  w  < 

(M:  odd),  regardless  of  strength,  or  (ii)  saliency  (width- 
strength  product)  p  -  w  •  H  <  This  allows  us 

to  essentially  separately  fine-tune  two  important  aspects  of 
filter  behavior.  In  a  nutshell,  M  controls  outlier  rejection, 
whereas  )^c  controls  residual  ripple. 

5  Conclusions 

Motivated  by  the  power  of  WC-based  methods  [6, 7, 2, 4], 
“complementary”  previous  work  by  the  first  author  in  op¬ 
timal  nonlinear  filtering  under  “hard”  structural  (so-called 
syntactic)  constraints  [11,  10],  and  realizing  that  a  poten¬ 
tial  shortcoming  of  WC  could  be  ameliorated  by  introduc¬ 
ing  “hard”  structural  constraints,  whereas  a  drawback  of 
the  methods  of  [11,  10]  could  be  rectified  by  introducing 
“soft”  weak  continuity  constraints,  we  have  posed,  solved, 
and  demonstrated  the  effectiveness  of  a  novel  hybrid  op¬ 
timization,  dubbed  Structurally  Robust  Weak  Continuity, 
combining  the  advantages  while  avoiding  the  shortcomings 
of  its  constituent  elements.  SR-WC  includes  its  constituent 
elements  as  special  cases,  and  inherits  efficient  Viterbi  im¬ 
plementation  from  [11, 10].  What  is  most  intriguing  is  how 
to  go  about  choosing  fidelity  and  smoothness/complexity 
measures.  This  deserves  further  investigation,  and  long¬ 
term  research  in  this  direction  is  currently  underway. 


400 


References 

[1]  R.  Bellman.  Dynamic  Programming.  Princeton  University 
Press,  Princeton,  N.J.,  1957. 

[2]  A.  Blake  and  A.  Zisserman.  Visual  Reconstruction.  MIT 
Press,  Cambridge,  Mass.,  1987. 

[3]  N.  Gallagher.  Median  filters:  a  tutorial.  In  Proc.  IEEE  Int. 
Symp.  Circ.,  Syst.,  ISCAS-88,  pages  1737-1744,  1988. 

[4]  Y.  Leclerc.  Constructing  Simple  Stable  Descriptions  for 
Image  Partitioning.  Int.  J.  Computer  Vision,  3(1):73-102, 
1989. 

[5]  J.-M.  Morel  and  S.  Solimini.  Variational  Methods  in  Image 
Segmentation.  Birkhauser,  Boston-Basel-Berlin,  1994. 

[6]  D.  Mumford  and  J.  Shah.  Boundary  detection  by  minimiz¬ 
ing  functionals.  In  Proc.  IEEE  Conf.  Computer  Vision  and 
Pattern  Recognition,  San  Francisco,  1985. 

[7]  D.  Mumford  and  J.  Shah.  Optimal  approximations  by  piece¬ 
wise  smooth  functions  and  associated  variational  problems. 
Communications  on  Pure  and  Applied  Math.,  42:577-685, 
1989. 

[8]  A.  Papoulias.  Curve  Segmentations  Using  Weak  Continuity 
Constraints.  M.Sc.  thesis,  Univ.  of  Edinburgh,  1985. 

[9]  A.  Restrepo  and  A.  C.  Bovik.  Locally  Monotonic  Regres¬ 
sion.  IEEE  Trans.  Signal  Processing,  41 (9): 2796-28 10,  Sep. 
1993. 

[10]  N.  Sidiropoulos.  Fast  Digital  Locally  Monotonic  Regression. 
Submitted  for  publication,  IEEE  Trans.  Signal  Processing. 
Sum  mary  to  appear  in  Proc.  1996  IEEE  Int.  Symp.  on  Cir¬ 
cuits  and  Systems,  May  12-15,  Atlanta,  GA. 

[11]  N.  Sidiropoulos.  The  Viterbi  Optimal  Runlength- 
Constrained  Approximation  Nonlinear  Filter.  IEEE  Trans. 
Signal  Processing,  44(3),  March  1996. 

[12]  N.  Sidiropoulos,  J.  Baras,  and  C.  Berenstein.  Structurally 
Robust  Weak  Continuity.  Submitted  for  publication,  IEEE 
Trans.  Signal  Processing. 


Figure  1.  Input  sequence,  {2/(n)}*"o 


Figure  2.  Output  of  digital  WC,  =  25 


Figure  3.  Output  of  VORCA,  M  =  15 


Figure  4.  Output  of  RC-WC,  M  =  15,  =  25 
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Abstract 

JVe  present  new  finite  dimensional  filters  for  estimating 
the  state  of  Markov  jump  linear  systemsj  given  noisy  mea¬ 
surements  of  the  Markov  chain.  Discrete  time  as  well  as 
continuous  time  models  are  considered.  A  robust  version  of 
the  continuous  time  filters  is  used  to  derive  a  discretization 
which  links  the  continuous  and  discrete  time  results.  Simu¬ 
lations  compare  the  robust  discretization  with  direct  numer¬ 
ical  solutions  of  the  filtering  equations.  The  new  filters  have 
applications  in  the  passive  tracking  of  maneuvering  targets 
and  speech  coding. 


1.  Introduction 

Consider  a  discrete- time  Markov  jump  linear  system 
whose  (vector)  state  equation  evolves  as: 

Sn  =  4-  Vn 

where  Xn  denotes  a  finite  state  homogeneous  Mcirkov  climn 
and  v„  is  a  zero  mean  stochastic  process  which  is  indepen¬ 
dent  of  the  process  Xn>  Assume  that  we  have  noisy  mea¬ 
surements  yn  of  the  Markov  chain  Xn  in  wliite  Gaussicin 
noise.  In  this  paper  we  show  how  to  compute  filtered  es¬ 
timates  Sn  of  the  state  s„,  i.e.,  s„  =  E{s„|^„}  where 
denotes  the  filtration  generated  by  the  observations. 

Instead  of  noisy  measurements  of  the  Markov  chmn  Xn , 
suppose  that  only  noisy  measurements  of  s„  are  available. 
In  such  a  case,  it  is  well  known  that  the  optimal  state  filter  is 
infinite  dimensional  [l].  Indeed  the  optimal  state  estimates 
would  involve  a  computational  cost  that  is  exponential  in 
the  data  length.  Sub-optimal  finite  dimensional  approxima¬ 
tions  are  given  in  [1].  However ^  as  we  show  in  this  paper ^ 
given  noisy  observations  yn  of  the  Markov  chain,  the  opti¬ 
mal  state  filter  for  s„  is  finite  dimensional  We  also  derive 
continuous- time  versions  of  the  filters. 

The  key  contributions  of  this  paper  can  be  summarized 
as  follows: 


•This  work  was  partially  supported  by  ATERB  and  ARC 
grants,  the  Cooperative  Research  Centre  for  Sensor,  Signal  and 
Information  Processing  (CSSIP)  and  a  Telstra  Research  Labora¬ 
tories  Postgraduate  Fellowship. 


1.  Finite  Dimensional  Filters:  In  Sec.  2  we  derive  fi¬ 
nite  dimensional  filters  for  state  estimation  of  discrete-time 
Markov  jump  linear  systems  given  noisy  observations  of  the 
Markov  chain.  These  derivations  are  based  on  the  reference 
probability  method  and  thus  lead  to  filtering  equations  in 
unnormalized  or  Zakai  form.  Finite  dimensional  filters  are 
presented  for  the  state  estimation  problem  in  continuous¬ 
time  in  Sec.  3. 

2.  Robust  Discretization:  Having  derived  both  continu¬ 
ous  and  discrete- time  filters  independently,  our  next  contri¬ 
bution  is  to  show  that  an  appropriate  robust  discretization 
of  the  continuous-time  filters  results  in  the  discrete-time 
filters.  This  is  the  subject  of  Sec,.  4, 

3.  Numerical  Examples:  Using  computer  simulations  in 
Sec.  5,  we  compare  the  performance  of  robust  discretized 
filters  with  two  standard  numericcil  approximations,  namely 
the  Euler-Maruyama  2md  Milstein  algorithms.  The  robust 
scheme  is  seen  to  outperform  these  methods  as  the  dis¬ 
cretization  step  size  is  increcised. 

2.  Discrete-Time  Filters 

2.1*  Signal  Model  and  Aim 

Let  Xi,  I  £  =:  {1,  2, . , .  ,  }  denote  a  S-state  discrete¬ 

time  Markov  cheiin  defined  on  a  probability  space  (f2,  T ,  P) 
with  state  space  {ei, . . .  ,  e^}  where  e,  denotes  the  unit  S- 
vector  with  1  in  the  tth  position.  Denote  the  transition 
probabilities  aji  =  P{Xn  =  CjjATn-i  =  e,)  and  A  for  the 

5x5  matrix  (flj,),  1  <  i^j  <  5.  Note  that  ^ 

for  1  <  i  <  5. 

Consider  the  following  jump  linear  system  driven  by  Xn'. 

Sn  ^  ^(^Xn—1^  5n— >1  4“  Vn  (l) 

where  Sn,v„  €  and  is  a  is  a  zero  mean  process 
independent  of  the  Markov  chain  Xn-  Assume  that  Xn  is 
observed  indirectly  via  the  scalar  process  yn  as  follows: 

yn  =  {g,Xn) Wn  (2) 

where  g  =  {gi  g2  •  ■  •  9sY  is  the  vector  of  levels  of  the  Markov 
chain.  Also  (*,•)  denotes  the  scalar  product  in  72.^.  w;„  is 
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white  Gaussian  noise  with  variance  independent  of  the 
processes  Xn  and  Vn- 

For  any  n  ^  ,  let  Tn  denote  the  sigma  field  generated 

by  Xm,  m  <  n.  Let  denote  the  sigma  field  generated 
by  3/m,  m  <  n.  Let  *  ®  ’  field 

generated  by  {Xm,  ym),  m  <  n. 

For  any  measurable  process  {^n},  let  =  E{(^n|yn}  where 
E  denotes  expectation  under  measure  P. 

Aim:  For  fixed  known  V£ilues  of  A,  g  and  and  of  the  ini¬ 
tial  state  So,  compute  the  filtered  estimates  Sn  =  E{sn|yn}- 

2.2.  Zakai  State  Filter  for  Jump  Linear  System 


Define  the  probability  measure  Po  such  that  the  Qn-i 
restriction  of  the  Radon-Nikodym  derivative  of  P  with  re¬ 
spect  to  Po  is 

dP  .  TT  f —{{g^Xm)^  2ym  {Qj  Xm))\ 

« '  n  [ - 5^?  ) 

If  (t>n,  n  e  Z'^  is  &  measurable  sequence,  then  an  abstract 
version  of  Bayes  theorem  states 


1  <  i  <  S,  the  probability  distribution  pt  =  {p\  P?  •  -PtY 
satisfies  the  forward  equation  dpt/di  =  Apt-  Also  note  that 

Ef=i  =  0  for  1  <  i  <  5. 

Consider  the  following  Markov  jump  linear  system 

C(A'r)  Sr  dr  -h  Vt  (5) 

where  st,vt  €  so  is  known  and  vt  is  a  zero  mean 

process  independent  of  J't-  Also  for  each  given  Xn  C{Xr) 
is  a  Af  X  M  known  matrix. 

Assume  that  Xt  is  observed  indirectly  via  the  process  yt 
where 

yt  = 

where  wt  is  a  standard  Wiener  process  independent  of  the 
processes  Xt  and  Vt- 

Let  Tt  and  yt  denote  respectively,  the  sigma-algebras 
generated  by  Xa,  s  <  t  and  3/3,  s  <  t.  Also  let  Qt  =  yty  Pt- 
Aim:  Compute  the  filtered  estimate  st  =  E{st|3^t}  a.s. 
where  E  denotes  expectation  under  measure  P . 

3.2.  Zakai  State  Filter  for  Jump  Linear  System 


/ 


{g,Xr)  dr  +  Wt 


Eo{An»^n|yn} 

Eo{An|yn} 


where  Eo  denotes  expectation  with  respect  to  Po .  De¬ 
fine  the  un-normalized  conditional  expectation  (Tn{<j>n)  = 
Eo{A„.^n|y„}  and  let 

6, (3/m)  =  exp  ,  »  =  1,  -  •  ■  .s. 


Theorem  2.1  The  filtered  state  is  given  by 

<Tn(s„)  =  5Zf=l 


(7n{Sn  Xn{i))  =  hi{yn)^^C{ej)  ttij  (Tn--l{Sn-l 

Proof  See  [2]  ^ 

To  compute  we  use  Thm.  2.1  and  the  normalization: 

5 

S„  =  <T„(s„)/(T„(l)  where  frn(l)  =  an(A^n(j)) 

i=i 


where  the  un-norm^ized  state  estimate  <rn(A^n(j))  is  com¬ 
puted  using  the  standard  HMM  state  filter  [3] 


Define  the  probability  measure  Po  such  that  the  Tt  re¬ 
striction  of  the  Radon-Nikodym  derivative  of  P  with  respect 
to  Po  is 

Note  that  under  Po,  yt  is  a  standard  Wiener  process  inde¬ 
pendent  of  the  process  Xt  [4]. 

Now  for  any  me2isurable  process  Hi  we  write  crt(Ht)  — 
Eo{AcH,\yt)  where  Eo  denotes  expectation  with  respect 
to  Po.  An  abstract  version  of  Bayes’  theorem  then  states 
that  ^  ^ 

H,  =  E{Ht\yt}  =  <T,{Ht)/ot{l) 

Theorem  3.1  The  Zakai  filter  for  s,  defined  in  (1)  is 
<rt(st  Xt(«))  =  So  Xo(*) -t- C(ei)  f  tTr(sr  Xr(i))  dr 

Jo 

-f  f  <rr{Sr  Xr{j))  Otij  dr  d-  f  gi(^r{Sr  Xr{i))  dyr  (6) 

j  =  l  ‘'0 

Proof  See  [2]  ^ 

To  obtain  st  from  (6)  we  use 


s 

an{XnU))  =  iTn-i  (Xn-l  (l))  (4) 

t=l 


S 

St  =  <Tt{st)/(Tt{l)  where  <Tt(l)  =  ^^<^t{Xt(j)) 

i=i 


3.  Continuous-time  Filters 
3.1.  Signal  Model  and  Aim 

Let  Xt,  i  >  0  he  a  continuous- time  Markov  chain  de¬ 
fined  on  a  probability  space  (n,:F,P)  with  state  space 
{ei,...  ,65}.  Let  the  transition  rate  matrix  (infinitesi- 
mzd  generator)  be  A.  That  is,  defining  p\  =  P{Xt  —  Ci), 


where  the  un-normalized  state  estimate  <Tt{Xt{j))  is  com¬ 
puted  using  the  standard  HMM  state  filter  [4], 

(Tt(Xt{j))  =  A'oCO  +  f  ^r(Xr{j))  O-ij  dr 

+  f  gi(rr{Xr{i))dyr  (7) 

Jo 
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4.  Rapprochement  of  Continuous  and 
Discrete-time  Filters 


Given  a  continuous-time  Markov  jump  linecir  system  and 
a  realization  of  the  observation  process,  we  are  interested 
in  obtaining  a  computable  approximation  of  the  continuous 
time  filters.  We  consider  two  approaches: 

1.  One  way  to  proceed  is  to  discretize  robust  versions  of  the 
continuous- time  filters  derived  in  Sec.  3.  This  is  discussed 
in  Sec.  4.1  and  4.2. 

2.  Alternatively,  the  continuous-time  jump  linecir  system 
c^  be  approximated  by  a  discrete- time  system  and  the 
discrete-time  filters  of  Sec.  2  applied.  This  is  discussed  in 
Sec.  4.3. 

The  aim  of  this  section  is  to  establish  the  equivalence 
of  these  two  approaches.  In  pcirticular,  we  will  show  that 
a  stemdard  first-order  discretization  of  the  robust  filter  is 
identical  to  the  discrete- time  filter  of  Sec.  2  applied  to  a 
^screte-time  approximation  of  the  continuous-time  Markov 
jump  linear  system. 

4.1.  Robust  Continuous-time  Filters 


In  this  subsection  we  derive  a  version  of  the  continuous¬ 
time  filter  which  depends  continuously  on  the  observation 
path.  Tliis  so  called  robust  filter  [5]  involves  the  solution  of 
an  ordinary  differential  equation  as  opposed  to  the  stochas¬ 
tic  differential  equation  of  (6).  This  robust  reformulation 
of  the  filtering  equations  is  also  applied  in  [6]. 

Let  <l>\  —  exp  Then  we  can  re-express  the 

Zakai  filter  (6)  in  robust  form  as  follows: 

Theorem  4.1  Suppose  and  (ft(stA't(i))  are  the 

solutions  of  the  ordinary  linear  differential  equations 

d  1  ^ 

^  ^  <!>{  (8) 

^  >=1 

d  ^ 

~  C'(e,)  «f,(seXt(»))  +  —T  aij  ij>\  at{stXt{j)) 

7^  (9) 


respectively. 

Then  for  all  0  <t  <T 


defines  a  locally  Lipschitz  continuous  version  of 
E[s,Xt(i)|3^,].  That  is, 

k((stX,(i))[yi]  -  5re(s,X,(.))[y2]|  <  K\\yi  -  yall 
where 

II2/II  =  sup  |3/(<)|, 

o<t<r 

1*1  15  the  Euclidean  norm  of  a  vector,  and  K  depends  on 

ibill  I|y2||. 

Proof  See  [2]  □ 


4.2.  Explicit  Time  Discretization  of  Robust  Filters 


In  what  follows  we  consider  a  regular  partition  of  the  in- 
tervcil  [0,  T\  into  N  intervals  of  length  A  with  tn  =  nA,  n  = 
0,...  ,W. 

If  we  use  an  explicit  first-order  Euler  approximation  for 
(9)  and  transform  back  to  the  standard  unnormalized  con¬ 
ditional  expectations,  we  arrive  at  the  following  approxi¬ 
mation  for  (6) 


a'n(5T»A^n(*))  l(5n— 1  1  (*)) 

+  C(eO 
s 

J  =  1 

where 

A  similar  procediu*e  leads  to  a  robust,  explicit  discretiza¬ 
tion  for  (7)  as  given  in  [5,  6] 

S 

+  AV-j.  aij  (Tn-l  (Xn-1  (j))  (11) 

>=I 


4  J.  Discrete-Time  Approximate  Model 


We  now  wish  to  consider  a  discrete-time  Markov  jump 
linear  system  that  approximates  the  continuous-time  one. 
We  use  superscripts  c  and  d  to  distinguish  between  discrete 
and  continuous- time  parameters  and  signals. 

Consider  the  discrete- time  Mcirkov  jump  linear  system 
with 

A*'  =  7  +  AA^ 

C"(.)  =  7  +  AC^(.) 
yi  =  (yn  ~  y^_i)/A 
=  1/A 

The  discrete- time  filter  equations  of  Sec.  2  become 
s 

^n(5nA’„(t))  =  +  A  C^(ej)){6ij  -I-  A  a^j) 

>=i 


^r„(X„(.))  =  {Sij  +  Aa^j)b,  j 

Note  that  bj  Finally  expanding  the 

above  equations  and  neglecting  the  O(A^)  terms,  we  obtain 
identical  filters  to  those  obtained  via  explicit  discretization 
in  Sec.  4.2.  The  important  conclusion  then  is  that  a  first 
order  discretization  of  the  robust  continuous- time  filter  is 
equivalent  to  the  discrete-time  filter  derived  in  Sec.  2. 
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Figure  1.  Mean  Square  Error:  A  =  0.1 


5.  Numerical  Examples 


Rather  than  using  this  robust  discretization  scheme,  the 
filtering  equations  (7)  and  (6)  may  be  directly  discretized 
using  standard  techniques  for  the  numerical  solution  of 
stochastic  differential  equations. 

In  this  section  we  compare  the  robust  discretization  with 
two  direct  techniques:  the  Euler-Maruyama  and  Milstein 
schemes.  Roughly  speaking,  the  Euler  sclieme  is  a  first 
order  approximation  (more  precisely  it  is  an  order  0.5  strong 
Ito-Taylor  approximation)  while  the  Milstein  scheme  is  a 
second  order  discretization  scheme  (an  order  1  strong  Ito- 
Taylor  approximation)  [7,  Chapter  10]. 

We  consider  a  two-dimensional  continuous-time  jump 
linear  system  driven  by  a  two  state  continuous-time  Markov 
chain.  The  system  parameters  Ay  C{ei)  and  C(e2)  used 
in  the  simulations  are  given  by 

-2)  ^('‘’=(2  -3)  ^<'^>=(0  0) 

For  aU  results,  the  simulation  period  was  10  seconds  and  the 
fast- sampled  versions  of  the  continuous- time  sample  paths 
were  generated  using  a  time  step  of  10  ^  seconds.  We  as¬ 
sume  perfect  knowledge  of  the  initial  state  of  the  Markov 
chain  and  jump  linear  system.  In  what  follows,  we  assume 
each  component  of  vt  is  an  independent  Wiener  process 
with  zero  drift  and  diffusion  coefficient  0.01. 

Fig.  1  and  2  illustrate  the  performance  of  the  robust, 
Euler  and  Milstein  discretizations  of  the  continuous-time 
filter. 

For  each  of  the  discretization  step  sizes,  A  =  0.1  (Fig.  1), 
A  =  0.25  (Fig.  2  ),  we  plot  of  the  evolution  of  the  mean 
square  error.  The  mean  square  error  values  were  calculated 
based  on  100  sample  path  runs.  For  ease  of  comparison 
each  run  was  performed  using  the  same  realization  of  the 
Mcirkov  state. 

With  a  small  discretization  step  size  the  peiform^ce  of 
all  schemes  is  comparable.  However  as  the  discretization 
step  size  is  increased  we  notice  that  the  behaviour  of  the 
Euler  (first  order)  and  Milstein  (second  order)  schemes  be¬ 
comes  quite  erratic.  In  contraist,  the  robust  discretization 
(first  order)  continues  to  track  satisfactorily. 


Time  (seconds) 


Figure  2.  Mean  Square  Error:  A  =  0.25 


6,  Conclusion 

In  this  paper  we  have  derived  finite  dimensional  optimal 
recursive  filters  for  estimating  the  state  of  a  Markov  jump 
linear  system  given  noisy  observations  of  the  underlying 
Markov  chain. 

We  then  presented  a  robust  discretization  of  the  contin¬ 
uous  time  filters  which  is  based  on  the  discretization  of  a 
robust  version  of  the  stochastic  filtering  equations.  The  r^ 
bust  discretization  leads  to  a  difference  equation  wliich  is 
equivalent  to  that  obtained  using  the  ^screte-time  filters 
on  a  discrete  approximation  of  the  continuous-time  model. 

Simulations  illustrated  the  advantages  of  the  robust  dis¬ 
cretization  over  techniques  based  on  the  direct  numerical 
solution  of  the  stochastic  filtering  equations. 
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ABSTRACT 

Recent  developments  in  the  theory  on  the  zeta  func- 
tionj  algorithms  on  generalizations  of  Euclidean  do¬ 
mains,  and  variations  on  equidistribution  theory  have 
led  to  algorithms  for  several  classes  of  problems  in  pa¬ 
rameter  estimation  that  are  general  and  very  efficient. 
We  present  the  theoretical  justifications  for  these  algo¬ 
rithms,  and  discuss  their  use  in  the  analysis  of  periodic 
pulse  trains. 

1.  INTRODUCTION 

^Problems  in  harmonic  analysis  and  synthesis  are 
intertwined  with  their  applications  in  signal  and  image 
processing.  Some  recent  advances  in  this  analysis  have 
used  number  theory  to  extend  existing  theories  (e.g., 
sampling  theory,  fast  transform  computations)  and  de¬ 
velop  new  approaches  to  problems  (e.g.,  interpolation). 
Number  theoretic  methods  have  also  been  successfully 
applied  to  the  analysis  of  periodic  point  processes.  The 
purpose  of  this  note  is  to  discuss  several  recent  develop¬ 
ments  in  which  number  theory  has  been  used  to  develop 
algorithms  for  several  classes  of  parameter  estimation 
problems. 

We  first  present  modifications  of  the  Euclidean  al¬ 
gorithm  which  determine  the  period  from  a  sparse  set 
of  noisy  measurements  [1,  2].  The  elements  of  the  set 
are  the  noisy  occurrence  times  of  a  periodic  event  with 
(perhaps  very  many)  missing  measurements.  The  pro¬ 
posed  algorithms  are  computationally  straightforward 
and  converge  quickly.  A  robust  version  is  developed 
that  is  stable  despite  the  presence  of  arbitrary  outliers. 
The  Euclidean  algorithm  approach  is  justified  by  a  the¬ 
orem  which  shows  that,  for  a  set  of  randomly  chosen 
positive  integers,  the  probability  that  they  do  not  all 
share  a  common  prime  factor  approaches  one  quickly 
as  the  cardinality  of  the  set  increases.  The  theorem  is 
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in  essence  a  probabilistic  interpretation  of  the  Riemann 
Zeta  Function.  In  the  noise-free  case  this  implies  con¬ 
vergence  with  only  ten  data  samples,  independent  of 
the  percentage  of  missing  measurements.  In  the  case  of 
noisy  data  simulation  results  show,  for  example,  good 
estimation  of  the  period  from  one  hundred  data  sam¬ 
ples  with  fifty  percent  of  the  measurements  missing  and 
twenty  five  percent  of  the  data  samples  being  arbitrary 
outliers. 

We  then  use  these  algorithms  in  the  analysis  of  pe¬ 
riodic  pulse  trains,  getting  an  estimate  of  the  underly- 
ing  period  [6,  7].  This  estimate,  while  not  maximum 
likelihood,  is  used  as  initialization  in  a  three-step  algo¬ 
rithm  that  achieves  the  Cramer-Rao  bound  for  moder¬ 
ate  noise  levels,  as  shown  by  comparing  Monte  Carlo 
results  with  the  Cramer-Rao  bounds.  An  approach  us¬ 
ing  multiple  independent  data  records  is  also  developed 
that  overcomes  high  levels  of  contamination. 

We  close  by  discussing  our  work  on  the  deinterleav¬ 
ing  of  multiple  periodic  pulse  trains.  Here  we  give  a 
variation  on  WeyPs  Equidistribution  Theorem,  which 
shows  that  noisy  phase- wrapped  data  is  equidistributed 
on  [0, 1)  almost  surely.  We  then  use  periodogram-like 
operators  in  a  multistep  procedure  to  isolate  funda¬ 
mental  periods. 

2.  MODIFIED  EUCLIDEAN  ALGORITHMS 

Our  problem  begins  with  a  set  of  noisy  occurrence 
times  of  a  periodic  event  with  (perhaps  very  many) 
missing  measurements.  We  have  developed  modifica¬ 
tions  of  the  Euclidean  algorithm  for  determining  the 
period  from  this  set  [1],  [2].  This  problem  arises  in 
radar  pulse  repetition  interval  (PRI)  analysis,  in  bit 
synchronization  in  communications,  in  biomedical  ap¬ 
plications,  and  other  scenarios.  We  assume  our  data  is 
a  finite  set  of  real  numbers 

■5  =  {sj }"=i ,  with  Sj  =  kjT  +  <j)  +  r},,  (1) 

where  r  (the  period)  is  a  fixed  positive  real  number,  the 
kj's  are  non-repeating  positive  integers,  (j>  (the  phase) 
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is  a  real  random  variable  uniformly  distributed  over 
the  interval  [0,r),  and  the  r/j’s  are  zero-mean  indepen¬ 
dent  identically  distributed  (iid)  error  terms.  We  as¬ 
sume  that  the  r]j's  have  a  symmetric  probability  den¬ 
sity  function  (pdf),  and  that  \r]j\  <  f  for  all  j.  We 
develop  an  algorithm  for  isolating  the  period  of  the  pro¬ 
cess  from  this  set,  which  we  shall  assume  is  (perhaps 
very)  sparse.  In  the  noise-free  case  our  basic  algorithm, 
given  below,  is  equivalent  to  the  Euclidean  algorithm 
and  converges  with  very  high  probability  given  only 
n  =  10  data  samples,  independent  of  the  number  of 
missing  mecisurements.  We  cissume  that  the  original 
data  set  is  in  descending  order,  i.e.,  sj  >  Sj+i. 

Modified  Euclidean  Algorithm 

1. )  After  the  first  iteration,  append  zero. 

2. )  Form  the  new  set  with  elements  sj  -  Sj+i . 

3. )  Sort  in  descending  order. 

4. )  Eliminate  elements  in  [0,t?o]  from  end  of  the  set. 

5. )  Algorithm  is  done  if  left  with  a  single  element. 
Declare  r  =  si.  If  not  done,  go  to  (1). 

Here,  0  <  770  <  is  a  noise  threshold.  Noise-free 
simulation  examples  demonstrate  successful  estimation 
of  r  forn  =  10  with  99.99%  of  the  possible  measure¬ 
ments  missing.  In  fact,  with  only  10  data  samples,  it 
is  possible  to  have  the  percentage  of  missing  measure¬ 
ments  arbitrarily  close  to  100%.  There  is,  of  course, 
a  cost,  in  that  the  number  of  iterations  the  algorithm 
needs  to  converge  increases  with  the  percentage  of  miss¬ 
ing  measurements.  In  the  presence  of  noise  and  false 
data  (outliers),  there  is  a  tradeoff  between  the  number 
of  data  samples,  the  amount  of  noise,  and  the  per¬ 
centage  of  outliers.  The  algorithm  will  perform  well 
given  low  noise  for  n  =  10,  but  will  degrade  as  noise 
is  increased.  However,  given  more  data,  it  is  possible 
to  reduce  noise  effects  and  speed  up  convergence  by 
binning  the  data,  and  averaging  across  bins.  Binning 
can  be  effectively  implemented  by  using  an  adaptive 
threshhold  with  a  gradient  operator,  allowing  conver¬ 
gence  in  a  single  iteration  in  many  cases.  Simulation 
results  show,  for  example,  good  estimation  of  the  pe¬ 
riod  from  one  hundred  data  samples  with  fifty  percent 
of  the  measurements  missing  and  twenty  five  percent 
of  the  data  samples  being  arbitrary  outliers  [1],  [2]. 

Our  algorithm  is  based  on  several  theoretical  results, 
which  we  now  present.  The  first  leads  to  a  modification 
of  the  basic  Euclidean  algorithm,  allowing  a  reformu¬ 
lation  using  subtraction  rather  than  division. 

Proposition  1  ([1]) 

gcd(rfci,...,rA:n)  = 

Tgcd((/ci  —  A:2)»  •  •  •  >  (^n-l  “  ^n)j  ^n)  •  (2) 


We  then  show  that  our  procedure  almost  surely  con¬ 
verges  to  the  period  by  proving  the  following  result. 
The  Riemann  Zeta  Function  is  defined  on  the  complex 
half  space  {z  £  C  :  5R(^)  >  1}  by  C(^)  =  ^ 

Euler  demonstrated  the  connection  of  C  with  number 
theory  by  showing  (in  1736)  that 


oo 


cw = n 

i=i 


1 

1  - 


,  5i(2)  >  1 , 


where  P  =  {pi)P2)P3i---}  =  {2,3,5,...}  is  the  set  of 
all  prime  numbers.  In  the  following,  we  let  P{'}  denote 
probability,  card{-}  denote  the  cardinality  of  the  set 
{•},  and  let  {1, ... ,  €}”  denote  the  sublattice  of  positive 
integers  in  R,**  with  coordinates  c  such  that  1  <  c  < 
Therefore,  1V„(£)  =  card{(fci  G  (1, . . . ,  ^}"  : 

gcd(fci , . . . ,  fc„)  =  1}  is  the  number  of  relatively  prime 
elements  in  (1, . . . ,  ^}”. 

Theorem  1  ([!])  For  n>2,  we  have  that 


lim^  =  [C(n)r‘- 


(3) 


ThereforCf  given  n  (n  >  2)  randovnly  chosen  positive 
integers  {ki,..  .,kn}, 


P{gcd(A:i , . . . ,  fcn)  =  1}  =  [C(^)] 


-1 


(4) 


Also, 

lim  [C(^)]”^  =  1 

n— ^oo 

converging  to  1  from  below  faster  than  1/(1  —  2^  ’^). 

Thus,  from  (4)  and  (5),  as  n  grows  it  quickly  becomes 
very  likely  that  n  randomly  selected  integers  have  a 
gcd  of  1.  This  fact,  together  with  Proposition  1,  make 
estimation  of  r  via  our  algorithm  possible. 

3.  PRI  ANALYSIS 


The  parameter  estimation  techniques  given  above 
lead  to  an  effective  method  for  periodic  pulse  inter¬ 
val  analysis  (see  [6],  [7]).  We  assume  time  is  highly 
resolved  and  ignore  any  time  quantization  error.  We 
are  primarily  concerned  with  a  single  periodic  pulse 
train  with  (perhaps  very  many)  missing  observations 
that  may  be  contaminated  with  outliers.  Our  data 
model  for  this  case,  in  terms  of  the  arrival  times  tj, 
is  given  by  (1),  with  the  additional  assumption  that  rfj 
is  zero-mean  additive  white  Gaussian  noise.  Outliers 
are  included  as  arbitrary  arrival  times.  The  problem, 
again,  is  to  recover  the  period  r  and  possibly  the  phase 
</>.  With  Gaussian  noise  the  minimum  variance  imbi- 
ased  estimates  for  this  linear  regression  problem  take  a 
least-squares  form.  However,  this  requires  knowledge 
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of  the  kj's.  We  therefore  propose  a  multi-step  pro¬ 
cedure  that  proceeds  by  (i)  estimating  r  directly,  (ii) 
estimating  the  kj’s,  and  (iii)  refining  the  estimate  of  r 
using  the  estimated  kj's  in  the  least-squares  solution. 
This  estimate  is  shown  to  perform  well,  achieving  the 
Cramer-Rao  bound  in  many  cases,  despite  many  miss¬ 
ing  observations  and  contaminated  data.  The  direct 
estimate  of  r  (step  (i))  is  obtained  Using  the  modi¬ 
fied  Euclidean  algorithms  described  above.  While  not 
maximum-likelihood  (ML),  the  modified  Euclidean  al¬ 
gorithm  performs  well  under  difiicult  conditions. 

We  now  give  the  maximum  likelihood  solution  and 
Cramer-Rao  bounds  for  estimating  r  and  Our  anal¬ 
ysis  has  led  us  to  work  with  the  data  set  {fj+i  -tjjjly, 
so  as  to  avoid  estimating  ^  (which  can  be  unreliable). 
Given  the  sample  data  set  S  from  (1)  we  may  write 


■  h  ■ 

■  1 

ki 

'  Vi  ' 

t2 

= 

1 

k2 

\r. 

+ 

V2 

1 

kfi 

_  Vn  . 

(6) 


be  expressed  element-wise  as  =  mm(i,j)—ij/n, 

and  is  therefore  easily  computed.  Although  optimal, 
use  of  (10)  requires  knowledge  of  X^.  This  is  not  a 
problem  if  there  axe  no  missing  observations  for  then 
—  j  for  i  =  1, 2, ...  n.  However,  when  observations 
are  arbitrarily  missing  then  the  kj's  are  not  known  in 
general,  and  one  is  faced  with  more  unknowns  than 
equations  in  (9). 

The  pdf  of  the  noise  77  in  (7)  is  multivariate  Gaus¬ 
sian,  leading  to  the  Cramer-Rao  bound  (CRB)  for  (10) 

var{r  -  f}  >  RJ^ Xd)-\  (11) 

with  as  =  2a„.  Generally,  the  CRB  is  reduced  for 
smaller  a^.  Also,  for  fixed  n,  it  is  reduced  when  the 
spread  of  the  k^s  increases. 

Now,  if  r  were  known  then  could  be  estimated 
using  (1/r)  y.  Ideally,  this  estimate  is  composed  of 
positive  integers,  but  imperfect  knowledge  of  r  and 
the  presence  of  noise  will  generally  yield  an  estimate 
of  Xd  that  has  non-integer  components.  We  therefore 
propose  to  estimate  Xd  via 


where  kj^i  >  kj.  In  compact  form  this  is 

t  =  X/3-h77,  (7) 

where  /3  =  [<j>,  r]^  and  the  definitions  of  t,  r/,  and  X 
follow  fi'om  (6).  We  eliminate  <{>  by  forming  the  dif¬ 
ferences  yj  =  tj^i  —  tj  =  {kj^i  —  kj)T  -h  (t/j+i  — 
yielding 


Vi 

k2  -  ki 

■  <Si  ■ 

2/2 

= 

ks  -  k2 

r  + 

<^2 

_  Vn-l  . 

kfi  kfi—i 

rfn-1 

where  Sj  =  rjj^i  —  r]j.  Similar  to  (7)  we  may  write  (8) 
compactly  as 

y  =  XdT  -h  S.  (9) 

Equations  (7)  and  (9)  are  linear  regression  prob¬ 
lems  whose  least  squares  solutions  yield  the  minimum- 
variance  unbiased  estimate  when  the  noise  is  zero-mean 
Gaussian,  e.g.,  see  Kay  [4].  Generally,  use  of  (9)  is  pre¬ 
ferred  for  estimating  r,  avoiding  estimation  of  (j>  which 
has  high  variance.  The  solution  to  (9)  corresponds  to 
ML  estimation  and  takes  the  form  of  a  least  squares 
estimate 


t  =  {xJr-^x^)-^xJr-^y,  (10) 

where  R5  =  We  have  assumed  white  noise  so 

Rs  =  cr^Rs  where  R^  has  2’s  on  the  main  diagonal, 
““I’s  on  the  first  upper  and  lower  diagonals,  and  zeros 
elsewhere.  In  general  R^  is  full  rank  and  its  inverse  can 


Xd  —  round 


1 

JMEA 


(12) 


where  tmea  is  the  estimate  of  r  obtained  via  the  mod¬ 
ified  Euclidean  algorithm,  and  round[*]  =  [*+2-1 
rounding  to  the  nearest  inte^r.  A  refined  estimate  of 
r  is  then  obtained  by  using  Xd  in  (10)  yielding 


r  =  (.XjRJ^X^)-^XjR-^y.  (13) 


This  result  approaches  the  optimal  minimum  variance 
performance  when  Xd  is  close  to  Xd.  The  refinement 
algorithm  is  summarized  as  follows. 


Refined  Estimation  Algorithm 

1. )  Estimate  r  via  the  modified  Euclidean  algorithm, 
calling  this  estimate  tmea^ 

2. )  Estimate  Xd  via  (12). 

3. )  Refine  the  estimate  of  r  using  Xd  in  (13),  calling 
this  estimate  r. 

Performance  analysis  of  the  estimate  tmea  depends 
not  only  on  the  distribution  of  the  noise  f]jj  but  also 
on  the  distribution  of  the  k/s.  We  have  completed  this 
analysis  for  some  specific  cases  in  [6].  We  also  com¬ 
pare  the  estimates  to  Cramer-Rao  bounds  via  Monte 
Carlo  simulation,  revealing  the  very  good  performance 
of  the  algorithm  with  many  missing  observations  and 
contaminated  data  (see  [6],  [7]).  We  can  also  apply  our 
estimation  procedures  to  estimation  of  the  frequency 
of  a  single  sinusoid  in  Gaussian  noise.  We  address  the 
problem  [8],  using  only  very  sparse  noisy  zero-crossings 
with  the  presence  of  outliers. 
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4.  DEINTERLEAVING 

We  close  by  discussing  our  work  on  deinterleaving. 
Our  data  model  is  the  union  of  M  copies  of  (1),  each 
with  different  periods  or  “generators”  F  =  {tj},  kij’s 
and  phases.  Let  r  =  maxi{ri}.  Then  our  data  is 

A  =  "h 

where  Uj  is  the  number  of  elements  from  the  gen¬ 
erator,  {kij}  is  a  linearly  increasing  sequence  of  nat¬ 
ural  numbers  with  missing  observations,  <f>i  is  a  ran¬ 
dom  variable  uniformly  distributed  in  [0,ri),  and  the 
jjy’s  are  zero-mean  iid  Gaussian  with  standard  devia¬ 
tion  3(Tij  <  r/2.  We  think  of  the  data  as  events  from 
M  periodic  processes,  and  represent  it,  after  reindex¬ 
ing,  as  A  =  .  Assuming  only  minimal  knowl¬ 

edge  of  the  range  of  {n},  namely  bounds  Tl,  Tu  such 
that  0  <Tl  <Ti  <Tu,we  phase  wrap  the  data  by  the 
mapping 


where  p  €  and  is  the  floor  function.  Thus 

{•)  is  the  fractional  part,  and  so  ^p{oci)  €  [0, 1). 

Definition  1  A  sequence  of  real  random  variables 
{xj}  C  [0, 1)  is  essentially  uniformly  distributed  in  the 
sense  of  Weyl  if  given  a,  6,  0  <  a  <  6  <  1, 

—card  {1  ^  J  ^  ^  ^  ^]}  — ^ 

n 

as  n  — y  00  almost  surely, 

Weyl’s  Theorem  is  presented  in  [3].  For  our  variation, 
we  assume  that  for  each  {kij}  is  a  linearly  increas¬ 
ing  infinite  sequence  of  natural  numbers  with  missing 
observations  such  that  kij  — >  oo  as  j  — >  oo.  We 
must  make  this  assumption  because  the  result  is  only 
approximately  true  for  a  finite  length  sequence. 

Theorem  2  For  almost  every  choice  of  p  (in  the  sense 
of  Lehesgue  measure)  ^p{ai)  is  essentially  uniformly 
distributed  in  the  sense  of  Weyl. 

Moreover,  the  set  of  p’s  for  which  this  is  not  true  are 
rational  multiples  of  {ri}.  Therefore,  except  for  those 
values,  ^p{aij)  is  essentially  uniformly  distributed  in 
[0,1).  Moreover,  the  values  at  which  ^p{aij)  =  0  al¬ 
most  surely  are  pE{ri/n:nEN}.  These  values  of  p 
cluster  at  zero,  but  spread  out  for  lower  values  of  n. 

We  then  map  the  phase  wrapped  data  by  non-linear 
variations  on  the  periodogram, 

F{ai,p)  =  4  lIcos2”-i(27r^) 


for  r  =  2, 3, _  Now,  the  periodicity  of  sin  and  cos 

gives  us  that  cos^”- ^(27r$p(aj))  =  cos^'’“^(27r^)  and 
sin^’‘“^(27r$p(a{))  =  sin^’’"^(27r^).  By  Theorem  2, 
the  random  variables  are  uniformly  distributed 

on  [0, 1)  for  almost  every  choice  of  p.  We  can  then  com¬ 
pute  the  distributions  of  the  real  and  imaginary  parts  of 
F.  The  “noise-like”  behavior  of  #p(a:!)  for  a.e.  p  leads 
to  a  “flat”  range  for  F.  However,  at  p  G  {n/n  :  n  G 
N},  we  have  increasingly  strong  peaks  as  n  decreases. 
In  turn,  this  gives  the  following.  Let  io  denote  the  in¬ 
dex  of  the  most  proliflc  generator,  and  3i,  9  denote  the 
real  and  imaginary  parts. 

Theorem  3 

max(SRF-|9F|)  =  ri„.  (18) 

p 

We  then  isolate  the  data  generated  by  ng  by  convo¬ 
lution  with  a  pulse  train  of  width  Tig ,  and  subtract  it 
out.  We  then  repeat  the  process,  terminating  when  A 
equals  the  empty  set. 
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Abstract 

This  paper  presents  a  test  which  accepts  or  rejects,  based 
on  the  data  collected  by  an  array  ofN  sensors,  the  hypoth¬ 
esis  that  the  sampled  tempo-spatial  field  is  spatially  sta¬ 
tionary,  The  proposed  test  is  applicable  to  arrays  in  an 
arbitrary  but  known  3 -dimensional  geometry.  It  is  based  on 
the  estimated  second  order  spatial  cumulant  spectrum  ma¬ 
trix,  which  is  theoretically  diagonal  for  a  stationary  spatial 
field.  We  show  how  the  proposed  test  can  be  used  for  robust 
detection  of  a  source  in  shallow  water. 


1.  Introduction 

This  paper  presents  a  test  which  accepts  or  rejects,  based 
on  the  data  collected  by  an  array  of  N  sensors,  the  hypothesis 
that  the  sampled  tempo-spatial  field  is  spatially  stationary. 
Unlike  the  test  proposed  in  [1]  which  is  applicable  only 
to  uniform  spatial  sampling  (i.e,,  only  to  linear,  uniform 
arrays),  our  test  is  applicable  to  arrays  in  arbitrary  but  known 
3 -dimensional  geometry.  Similarly  to  the  test  for  temporal 
stationarity  of  time  series  [3],  our  test  uses  the  cumulant 
spectra  of  the  tempo-spatial  field.  In  particular,  it  is  based 
on  the  estimated  second  order  spatial  cumulant  spectrum. 

It  can  be  shown  that  the  necessary  and  sufficient  condi¬ 
tions  for  the  spatial  fields  to  be  spatially  stationary  are: 

1 .  The  sources  are  uncorrelated  and  are  located  in  the  far 
field  zone. 

2.  The  sources  are  zero  mean,  temporally  stationary. 

3.  The  additive  noise  is  spatially  stationary. 

The  last  two  conditions  are  satisfied  in  most  applications^ 
and  therefore  the  test  can  be  used  for  studying  the  phys¬ 
ical  scenarios  related  to  condition  1.  For  example,  if  the 

^  It  can  be  shown  that  while  with  real  data  collected  in  shallow  water, 
condition  3  is  not  always  satisfied,  a  certain  operation  on  the  data  can 
impose  the  additive  noise  to  be  spatially  stationary. 


propagation  medium  is  known  to  be  bounded  (as  in  shallow 
water),  spatial  non-stationarity  indicates  existence  of  one  or 
more  sources  in  the  sampled  tempo-spatial  field. 

The  proposed  test  (detector)  does  not  employ  any  prior 
knowledge  about  the  source,  the  number  of  the  sources,  the 
noise  and/or  the  bounded  propagation  environment.  There¬ 
fore,  it  is  robust  to  modeling  uncertainties  or  mismatches, 
which  are  a  major  problem  in  underwater  acoustics.  The 
performance  degradation  due  to  this  robustness  is  studied 
by  comparing  the  performance  of  the  proposed  test  to  those 
of  two  generalized  likelihood  test  {GLRT)  detectors: 

•  GLRT\,  in  which  only  the  spatial  spectrum  of  the 
additive  noise  is  assumed  to  be  known, 

and 

•  GLRT2,  in  which  the  spatial  spectrum  of  the  additive 
noise  and  the  propagation  channel  are  assumed  to  be 
known. 

We  show  that,  as  expected,  if  there  is  no  mismatch  in  prop¬ 
agation  channel,  the  GLRT2  outperforms  GLRT  I  which 
outperforms  the  proposed  test.  In  the  presence  of  modeling 
mismatches  the  performances  of  the  two  GLRT  detectors 
reduce  dramatically.  However,  as  the  proposed  test  reflects 
the  degree  of  stationarity,  its  performance  does  not  signifi¬ 
cantly  vary. 

2.  The  proposed  test 

Assume  an  array  of  sensors  with  N  elements  located  at 
2/i(^)  denote  the  measured  random  field  by 
sensor  i  at  time  i:  yi{t)  =  y{t,Xi),  z  =  if 

the  random  field  is  zero  mean,  the  entries  of  the  N  x  N 
spatial  covariance  matrix  are  given  by  the  samples  of  the 
cross-correlation  function  R2/(i,  Xi ,  X2): 

[I^2/W]ii  =  Ri/(i,Xi,Xj)  =  E{y{t,Xi)y*{t,Xj)} 

hi  =  (1) 
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1)  and  covariance 


If  the  field  is  temporally  stationary,  then  [Ry(f)]ij  = 

=  Rj/(xi,Xj). 

The  second  order  cumulant  spectrum  is  defined  by  a 
two  dimensional  Fourier  transform  of  the  cross-correlation 
function: 


S3,(k,,k2)=  [  /  Rj;(x,,X2)e-^(^?^'+*^^''^)dxidx2. 

(2) 

where  k  is  the  3-dimensional  vector  of  the  wave  numbers  at 
the  three  directions.  If  the  field  is  spatially  stationary,  then 
Ry(xi,X2)  =  /unc(xi  -  X2).  In  [2],  it  is  shown  that  this 
condition  leads  to: 


Sy(k|,k2)  =  func{k])6{ki  -l-k2)  .  (3) 

We  construct  the  stationarity  test  by  studying  the  estimated 

second  order  spatial  cumulant  matrix  [«5]ij  =  Oy(ki  j,k2j). 

Theoretically,  if  the  field  is  spatially  stationary,  only  the  di¬ 
agonal  of  the  matrix  §  consists  of  non-zero  elements.  Our 
proposed  test  is  based  on  this  property. 


The  test  is  based  on  accepting  or  rejecting  the  null  hy¬ 
potheses: 

Hq:  spatially  stationary  field:  Sy(k),k2)  =  /(ki)(5(ki  + 

ki). 

If  the  received  signal  at  the  array  is  (temporally)  ergodic, 
then: 

Ry(xi,X2)=  lim  1;  f  y(f,xi)y*(f,X2)df  (4) 

^  T-^oo  1  yj- 

and  the  spatial  covariance  matrix  can  be  estimated  by  inte¬ 
gration  over  sufficiently  large  observation  time  T .  Hence, 
the  samples  obtained  from  an  array  of  sensors  are  used  to 
estimate  the  covariance  matrix.  The  second  order  cumulant 
spectrum,  §,  is  calculated  by  applying  the  discrete  Fourier 
transform. 

Our  test  is  an  ad-hoc  one,  and  it  suggests  to  evaluate  the 
intensity  of  the  off-diagonal  entries^  of  S  and  to  compare  it 
to  a  given  threshold: 

i^j  """ 

Selection  of  the  threshold  7  determines  the  probability  of 
“false  alarm,”  P/a-  For  example,  if  the  additive  noise  is 
zero  mean,  i.i.d.  Gaussian  noise  such  that  its  time  samples 
at  different  sensors  satisfy: 

E{nm{l)n;{l)}  =  Rn{m,p)  =  al6{m-p)  (6) 

then,  the  asymptotic  distribution  of  the  test  statistic  C  under 
Ho  is  complex  Wishart  [2]  with  degrees  of  freedom  f  (iV  - 

2  Since  S  is  an  Hermitian  matrix,  the  test  uses  either  the  upper  or  the 
lower  off-diagonal  matrix. 


L  • 


Cl».  ~  H'-lf  (N-l),  1)) 

(7) 

L  is  the  number  of  time  samples,  which  is  roughly  the  time- 
bandwidth  product,  and  x^(M)  is  the  Chi  squares  distribu¬ 
tion  with  M  degrees  of  freedom.  That  is,  for  any  give  false 
alarm  probability,  Pfa  =  Prob{C\m  >  7}.  the  threshlod  7 

nan  hp  spt. 


3.  The  alternative  tests 

While  the  proposed  test  only  uses  the  fact  that  under  Ho 
the  field  is  stationary,  both  alternative  tests  employ  spe¬ 
cific  prior  information  about  the  propagation  field.  For 
GLRTl  only  the  noise  covariance  matrix,  R„(m,p)  = 
E{nmil)n;{l)},  m,p  =  l,...,Ar,  is  assumed  known. 
Therefore,  the  test  accepts  Ho  (no  signal)  if  the  measured 
covariance  matrix  is  J?„  and  rejects  it  otherwise.  Since 
there  is  a  one-to-one  correspondence  between  the  covari¬ 
ance  matrix  and  the  cross-spectral  matrix  (the  second  order 
spatial  cumulant  spectrum),  the  test  can  similarly  be  put  in 
terms  of  the  estimated  spectral  matrix  S .  Since  §  is  Hermi¬ 
tian,  it  is  sufficient  to  look  at  either  the  upper  or  the  lower 
off-diagonal  entries  of  ^  and  the  diagonal  terms.  Putting 
these  N  ^{N  -  \)  entries  in  a  vector  s,  this  vector  is 
asymptotically  complex  Gaussian.  Under  Ho  its  mean,  po, 
and  covariance,  Ao,  are  assumed  known  while  under  H\  its 
mean  vector,  p\ ,  and  its  covariance  matrix,  Ai ,  are  free. 
Thus,  the  GLRT  takes  the  form: 

,  /(s|ifo)  fmo)  >"  (8) 

max^.,A,/(llif,)  /(slMf^.A]^^) 

where  and  Af^^  are  the  maximum  likelihood  esti¬ 
mates  of  Pi  and  A)  (the  sample  mean  and  covariance  of  S). 
For  large  observation  time  (where  the  Gaussian  assumption 
holds)  the  test  is  equivalent  to: 

Cl  =  (s  -  po)^Ao'  (s  -  Po)  <  71  (9) 

^i) 

For  the  case  of  white,  i.i.d.  noise  the  test  gets  a  form  similar 
to  (7)  and  the  threshold  71  can  be  evaluated  for  any  given 

. 

Note  that  GLRT\  can  be  also  phrased  in  the  domain  of 
the  covariance  matrix,  as  in  [1]. 

The  other  alternative  test,  GLRTl,  employs  even  more 
prior  information  than  GLRTl:  it  assumes  also  that  the 
propagation  channel  is  modeled  with  unknown  parameters. 
The  test  decides  on  one  of  the  two  hypotheses: 

Ho  :  y(0  =  n(0 

Hi  :  y(/)  =  g{d)s{l)  +  n{l)  (10) 
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where  I  —  1, L  are  the  time  samples,  n  is  the  noise 
vector,  s  is  the  source  signal  of  unknown  power  and  g(-) 
is  a  known  function  which  characterizes  the  propagation 
channel.  6  is  the  vector  of  the  unknown  source  location  and 
channel  parameters,  n  and  s  are  assumed  to  be  zero  mean, 
uncorrelated  Gaussian  processes  of  known  covariance 
and  1 ,  respectively.  The  distribution  of  the  data  under  Hq  is 
completely  known  (as  in  Under  iTj  itis  known  up 

to  few  parameters  (erg  and  9)  which  appear  in  the  covariance 
of  a  Gaussian  distribution,  and  therefore  the  GLRT  can  be 
formed.  The  resulting  test  is: 


Ca  =  m\n{ln(j){y,e)  -  4>{y,9)) 

u 


Ho 

> 

< 

H, 


72 


(11) 


where  for  white  noise  of  variance  (j){y,6)  = 

A  =  I  Eti  y(0y"(0-  For  this  test, 
however,  it  is  hard  to  establish  the  threshold  72  analytically 
for  a  given  P/a  even  when  the  noise  is  white. 

Equation  (11)  shows  that  unlike  GLRTl  and  the  pro¬ 
posed  test,  GLRTl  involves  a  multidimensional  search  pro¬ 
cedure  over  the  unknown  vector  parameter  9.  Furthermore, 
it  requires  knowledge  of  the  number  of  sources  under  hy¬ 
pothesis  H\ . 

Since  a  test  which  employs  more  prior  information  per¬ 
forms  better,  we  expect  that  the  GLRTl  would  outperform 
the  GLRT  1  which  would  outperform  our  ad-hoc  stationar- 
ity  test.  In  the  next  section  we  consider  a  practical  problem 
where  we  demonstrate  that  this  is  indeed  the  case  when  the 
propagation  channel  corresponds  to  the  modeling  assump¬ 
tion.  If,  however,  there  are  modeling  mismatches,  as  is  usu¬ 
ally  the  case,  e.g.  in  underwater  acoustics,  the  difference  in 
performance  of  the  three  tests  becomes  negligible. 


4,  Simulation  results  and  conclusions 


In  this  section,  we  focus  on  one  of  the  applications  of 
the  test  in  a  shallow  water  waveguide.  Fig.  1  describes  the 
environmental  scenario  which  was  one  of  the  more  complex 
benchmarks  used  in  the  May  1993  NRL  Workshop  on  A- 
coustic  Models  in  Signal  Processing  [5].  We  apply  our  test 
to  detect  a  source  in  the  channel.  Using  a  normal  mode  prop¬ 
agation  program,  KRAKEN  [4],  we  simulated  the  channel 
in  which  a  narrow  band  point  source  at  frequency  lOOHz  in 
a  temporally  and  spatially  white,  zero  mean,  Gaussian  noise 
is  assumed.  The  source  was  located  at  depth  zq  =  50m  and 
range  of  vq  =  7000m  from  the  array.  200  snapshots  of  the 
received  field  were  obtained  by  a  uniform,  vertical  array  of 
13  sensors  whose  aperture  is  the  depth  of  the  propagation 
channel. 

Figure  2  compares  the  receiver  operation  characteristic 
(ROC)  of  the  tests  presented  in  this  paper,  assuming  no 
mismatch  in  the  channel  environmental  parameters.  Fig. 


3  presents  the  probability  of  detection  of  a  source  in  the 
channel  as  a  function  of  the  signal-to-noise-ratio  (SNR)  per 
sensor  per  snapshot,  with  false  alarm  10~^. 

In  the  second  experiment  mismatches  were  induced  in 
the  environmental  parameters  according  to  Fig.  1.  The 
performance  of  the  test  is  depicted,  in  its  two  representations 
of  Fig.  2  and  Fig.  3,  in  Fig.  4  and  in  Fig.  5,  respectively. 
The  performance  of  the  GLRTs  reduces,  while  that  of  the 
proposed  stationarity  test  even  improves  slightly. 

These  results  show  that  the  proposed  test  for  source  detec¬ 
tion  in  shallow  water  is  robust  while  the  other  tests,  specially 
the  GLRTl  are  very  sensitive  to  errors  in  the  assumed  en¬ 
vironmental  parameters.  In  addition  it  is  computationally 
simple  and  does  not  involve  a  search  procedure  over  the 
unknown  parameters. 
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Figure  1.  The  NRL  workshop  “genimis”  sce¬ 
nario  configuration 
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Comparison  of  ROC  for  tests  at  SNR=-1 5dB  Comparison  of  ROC  for  tests  at  SNR=-1 5dB 


PFA  PFA 


Figure  2.  Receiver  operation  characteristic  for  Figure  4.  Receiver  operation  characteristic  for 

source  detection  with  no  environmental  mis-  source  detection  under  environmental  mis¬ 
match.  match. 


Comparison  of  Tests  for  Pfa=10-3 


SNR 


Comparison  of  Tests  under  Mismatch  lor  Pfa=10-3 


SNR 


Figure  3.  Probability  of  source  detection  vs.  Figure  5.  Probability  of  source  detection  vs. 

SNR  with  no  environmental  mismatch.  SNR  under  environmental  mismatch 
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Abstract 

The  problem  addressed  in  this  paper  is  the  detec¬ 
tion  of  cyclostationarity,  and  the  measurement  of  the 
trend  of  a  process  to  have  this  property.  This  problem  is 
of  great  importance,  because  in  applications  algorithms 
using  the  property  of  cyclostationarity  assume  the  peri¬ 
odicities  of  the  statistics  to  be  known.  Thus  the  period¬ 
icities  need  to  be  detected/ estimated,  and  furthermore, 
a  measure  must  be  given  in  order  to  qualify  the  trend 
of  a  process  to  have  a  given  periodicity.  This  measure 
will  give  information  about  the  opportuneness  of  using 
a  cyclostationary  modelization  instead  of  a  stationary 
one. 


1.  Introduction 

As  pointed  out  by  the  number  of  publication  on 
the  subject,  cyclostationary  processes  (processes  whose 
statistics  depend  periodically  on  time)  are  of  great 
interest  in  many  fields  (communications,  signal  pro¬ 
cessing,  hydrology,  multivariate  analysis,  array  pro¬ 
cessing...)  and  have  shown  to  provide  a  better  mod¬ 
elization  in  many  cases  of  interest  than  the  stationary 
’’syndrome” . 

The  problem  addressed  in  this  paper  is  the  hypo¬ 
thesis  testing,  of  cyclostationarity  versus  stationarity 
for  Gaussian  processes.  Indeed,  althought  interesting 
and  useful  tests  for  the  presence  of  cyclostationarity 
have  been  introduced  in  [7]  [8]  [10]  [5],  we  got  interested 
in  developing  another  test  relying  on  the  theory  of  de¬ 
tection  [2].  As  it  will  be  stated  in  this  paper,  the  test  we 
develope,  will  lead  to  a  natural  ’’measure”  of  cyclosta¬ 
tionarity,  in  close  connection  with  the  Kullback-Leibler 
information. 

We  first  recall  some  properties  of  second  order  cyc¬ 
lostationarity  and  notations  useful  in  a  cyclostationary 
framework.  We  then  briefly  recall  the  works  of  [8]  [10] 


and  [5]  on  the  problem  of  detection.  We  then  present 
our  test,  and  results  concerning  the  cyclostationary 
version  of  Whittle’s  approximant,  and  the  asymptotic 
power  and  false  alarm  probability  of  our  test. 

1.1.  Cyclostationarity 

A  real  valued  time  series  is  said  to  be  cyc¬ 

lostationary  when  the  following  properties  are  respected 
for  all  n  and  an  integer  T  [9]: 

E  [xn]  <  +00 

E[xn+r]  =  E[x„] 

rx  [n,  r]  E  [Xn^n+r]  =  E  [x„+TXn+r+T]  (l) 

The  correlation  matrix  E„  of  size  nT  of  the  process 
block-Toeplitz  in  the  general  case,  and  Toep- 
litz  when  the  process  is  stationary,  that  is  T  =  1. 

Due  to  periodicity  in  time,  the  correlation  can  be 
written  as  follows  [9]: 

r:t  [n,  r]  =  ^  [t]  exp  (2) 

The  (rr[r])k=o...T-iT€^  are  called  the  cyclocorrela¬ 
tions.  One  can  associate  to  those  cyclocorrelations  the 
cyclospectra,  defined  as[9]: 

rl[r]=  f  exp (»Ar)  (A) dA  (3) 

Jo 


1.2.  The  Zivanovic-Gardner’s  Degree  of  Cyclostation¬ 
arity 

Zivanovic  and  Gardner  have  defined  a  degree  of  cyc¬ 
lostationarity  by  a  distance  between  the  correlation  of 
the  cyclostationary  process  to  the  correlation  of  the 
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’’closest”  stationary  process  [8].  This  definition  leads 
to  the  usefull  following  expression: 


which  is  none  but  the  ratio  of  energy  of  the  non  station¬ 
ary  part  of  the  process  with  the  stationary  one. 


1.3.  Hurd-Gerr’s  test 


We  build  our  test  on  the  basis  of  the  likelihood  ratio 
[2],  whose  optimal  properties  are  well  known: 

Psn  ((^*)i=l . r.) 

where  is  Toeplitz,  and  En  is  T-block-Toeplitz, 
which  corresponds  to  the  two  hypothesis  we  are  testing. 

The  log-likelihood  of  a  Gaussian  random  process 
takes  the  well  known  following  form: 


This  test  relies  on  the  property  of  correlation  of  the 
spectral  measure  of  a  cyclostationary  process.  We  thus 
calculate  a  normalized  spectral  correlation: 


7(A,.h.M)  =  ^ 


M-i  ^  ^ 

/at  (Aj+tn)  fjv  ('^i+h+">) 

m=0 _ 


kf-1  ^  2  M-1  ^  2 

Y]  /jv(Aj+m)  Y1  fjv(Aj+/t+m) 

n=0  m=0 


where  Tn  (A)  =  Yln-o  - 

27rAr/  N  and  M  is  a  smoothing  parameter.  Then  the 
result  of  Goodman  is  used:  P  (7  >  c)  =  (1  “  c) 
under  given  conditions  [10]. 


1.4.  Dandawat^Giannakis’s  test 


This  test  uses  the  asymptotic  properties  of  the  es¬ 
timators  of  cumulants,  and  the  test  is  formulated  as 
follows:  for  a  candidate  cycle  a  one  makes  the  follow¬ 
ing  hypothesis  testing: 

Ho  :  Ck,x  =  for  all  arguments.  (6) 

Hi  :  dk^a:  =  ^k,x  +  for  some  arguments. 

where  is  nonzero  (it  is  the  cyclic-cumulant 

of  cycle  a  of  the  process  x),  and  is  a  zero  mean 
random  variable.  The  asymptotic  statistics  of  are  a 
classical  result,  from  which  an  hypotheses  test  is  built, 
allowing  one  to  take  a  statistical  decision. 


2.  Testing  Cyclostationarity  versus  Sta- 
tionarity 


2.1.  Expression  of  the  test 

We  consider  a  Gaussian  random  process 
which  can  be  cyclostationary  of  period  T  or  stationary. 
Without  any  loss  of  generality,  we  assume  the  process 
to  be  0  mean.  The  aim  of  the  test  will  be  to  decide 
wether: 


Ln  —  log  (Pn  ((^*)fc=l,...,n)) 

=r  {nlog(2;r) -|-log(|E|)-(-x].E"‘£„}  (9) 

Studying  the  asymptotic  behaviour  of  the  test  is  not 
straightforward,  and  an  alternative  consists  in  develop- 
ping  the  ’’principal  part”  of  the  log-likelihood. 

2.2.  Approximant  of  the  log-likelihood 

We  introduce  the  following  notations: 

J  [j^'"log(det(JF(A)))  (10) 

+Tr  (l„  (A)  J--'  (A))  dA]  +  — 

[^(A)W)  =  (11) 

We  will  call  the  Gladysev’s  spectral  matrix  associ¬ 
ated  with  (*fc)fcgE-  The  following  scalar  product  will  be 
required: 

(^’o,  Fi>  1/2  =  1*1  (^0  [fc]  Fi  w)  (12) 

ke^ 

on  the  Banach  space  Br  of  Tx  T  matrix  valued  periodic 
functions  F  e  whose  associated  series  converge  to  F 
and  l|P||i/2  <  W  Fourier  coefficient 

of  F  (.)).  We  will  also  use  the  same  notation  for  the 
scalar  case:  more  precisely,  when  being  concerned  with 
scalar  products  containing  T r  (A)  or  det  (A)  the  same 
notation  will  be  used. 

One  can  extend  the  results,  available  in  the  scalar 
Gaussian  stationary  case  [3],  concerning  the  asymp¬ 
totic  behaviour  of  the  log-likelihood,  of  a  cyclostation¬ 
ary  Gaussian  process  [1],  which  leads  to  the  following: 


^0  :  {^k)ke^  is  stationary 

Pi  :  {xk)k^z  is  cyclostationary 


Theorem  1  If  Fq  and  F\  are  the  Gladysev^s  spec~ 
tral  matrices  of  two  Gaussian  cyclostationary  random 
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processes^  such  that  To  and  Ti  belong  to  Bt  and 
det  {Ti)  >  0,  then 

^To[Zn[T,)]  =  -i[i||log(det(JF0)ll^/2 

+  (log(det(:ri)),Tr 
+  {Tr{T„Tr{T:^)))^^^+o{l) 

Yar^,[Zr,{Ti)]  =  0(1)  (13) 

and  when  Ti  =  /y  then, 

Yarj^^[Zn{Ti)]  =  -  (log  (det  (:Fi))  ,Tr 

-2||Tr(j^-)li;^^+o(l) 

Similar  results  have  already  been  demonstrated  in 
the  multivariate  case  in  several  papers  (see  references 
in  [6]),  but  the  results  presented  here  are  more  precise. 
These  results  will  allow  in  practice,  under  given  hypo¬ 
thesises,  to  obtain  the  precise  rate  of  convergence  to 
the  approximation  of  the  likelihood  ratio. 


2.3.  Asymptotic  behaviour  of  the  test 


This  section  holds  with  a  direct  application  of  the 
result  obtained  in  the  preceeding  section,  and  allows 
one  to  define  a  degree  of  cyclostationarity  relying  on  the 
likelihood  ratio  test,  and  has  interpretation  in  terms  of 
the  Kullback-Leibler  information. 

The  log  likelihood  ratio  takes  the  following  well 
known  form: 

Qnt  =  log(po((x„)„^l 

~log  (pi  ((^n)„_i . JVt))  (1“^) 

and  we  introduce  the  asymptotic  form  of  the  likelihood 
ratio: 

^  \T  r 

^  Jo  P°s(det(:r(A))/(j;?^(A))^j  (15) 

+  Tr  {B)  (A)  -  (A)  /.)"')  }]  dA 

The  results  introduced  in  the  preceeding  section  shows 
that: 

\^/i  (e^T  -  Snt)  I ""  O  (1)  (16) 

Furthermore,  one  has  the  following  Lemma  (see  refer¬ 
ences  in  [6]),  whose  scalar  proof  was  given  by  Gren- 
ander  and  Rosenblatt: 


Lemma  2  Let  F  he  a  bounded  odd  matrix  valued  func- 
tionSj  and  the  multidimensionnal  periodogram  as¬ 
sociated  with  a  realisation  of  a  Gaussian  processes  of 
spectral  $  density  then,  as  n  — >■  -hoo: 


Et  Tr  (T„F^  dX^  ^  Tr  ($F)  dA  and, 


(  f  Tr  (J„f)  d. 

0~  -  / 

\Jo  '  ' 

J  « x 

This  Lemma  can  be  straightforwardly  extended  to 
the  cyclostationary  case  [1],  where  F  is  replaced  by  the 
Gladysev^s  spectral  matrix  T. 

Then,  using  this  Lemma,  and  considering  —^Qnt 
as  iV  -hoo  under  hypotheses  one  obtains: 

(17) 


where 

^  ^  I  [log  (det  (:F)/ (18) 

+  T-Tr{F/Tl)]d\  (19) 

which  can  be  interpreted  according  to  the  Kullback- 
Leibler  information  of  the  corresponding  cyclostation¬ 
ary  and  stationary  processes  that  is 

ti  =  K{T,TllT)  (20) 


3.  Asymptotic  probabilities 


In  this  section,  we  study  the  asymptotic  power  a  and 
the  asymptotic  false  alarm  probability  jd  of  the  likeli¬ 
hood  ratio  test.  These  results  have  been  developped  in 
the  framework  of  stationary  scalar  Gaussian  processes 
[4].  As  shown  below,  the  proof  of  the  results  in  the 
cyclostationary  case,  is  not  different  from  what  can  be 
found  in  [4].  The  results  had  only  to  be  checked  and 
written  once  for  all. 

3.1.  Preliminary 

Here  we  recall  the  following  theorem: 

Theorems  If  log  [det  (Tq))  and  log  [dei[Ti))  belong 
to  then  for  i  G  [0, 1] 

QnT  (^)  ”4^  Qo  (f) 


where 


Qo  (^)  =  ^  y  log  (det  (^o)) 

+  (l-01og(det(J-i))  (21) 
-  log  {t  det  (Fo)  +  (1  - 1)  det  (J-j))]  dA 


QnT  {t)  =  ^  log 


Eo  I  exp  ( t  log 


dp®nT 

dp^ 
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The  calculus  for  the  proof  are  the  same  matrix  ma¬ 
nipulations  in  the  block- Toeplitz  case  as  in  [4]  for  the 
stationary  case,  and  Szego’s  theorem  for  the  cyclosta¬ 
tionary  case  (which  is  required  for  the  proof),  is  impli- 
citely  shown  in  [1]. 

The  series  of  probabilities  associated 

with  the  spectral  matrices  and  is  said  in  that 
case  to  be  of  Chernoff. 

3,2.  Formula  ”a  la  Chernoff” 

The  following  theorem  is  demonstrated  in  [4]  for  the 
stationary  scalar  case,  and  the  extension  to  the  cyc¬ 
lostationary  case  is  not  difficult  [1]. 

Theorem  4  Lei  us  consider  a  series  of 

Chernoff j  we  further  suppose  that: 

L  K  {Po,Pi)  and  K  (Pi,Po)  exist 

2,  We  define  a  G  [—K  (Po,  Pi)  (Pi,  Pq)] 

then,  as  n  H-oc 

ilog(/>?"(ilog(^)<«))  ^  .  +  /..(«) 

where 

ho  (a)  =sup  {9a  -  Qo  {9))  (22) 

and  A"  (Po,  Pi)  is  the  Kullback  information  between  the 
two  processes. 

Thus  in  the  cyclostationary  case,  in  the  framework 
of  our  test,  the  asymptotic  power  and  false  alarm  of  the 
Neyman- Pearson  test  can  be  written  in  the  interesting 
following  manner: 

a  {a)  =  —h{a)  and /?  (a)  =  a  —  a  (a)  (23) 

hi  (a)  =  sup  {9a  -  Qi  (^)) 

dG® 

4*  Conclusion 

We  propose  a  test  for  cyclostationarity  based  on  the 
theory  of  decision  which  gives  birth  to  a  measure  of  cyc¬ 
lostationarity  which  can  be  connected  to  the  Kullback- 
Leibler  information  of  information  theory.  This  test 
allows  one  to  test  wether  a  frequency  is  a  cyclic  fre¬ 
quency  of  a  given  process,  and  gives  an  indicator  of  the 
degree  of  cyclostationarity  according  to  this  cyclic  fre¬ 
quency.  This  work  has  two  main  interests:  we  provide 
a  statistical  test,  a  strong  theoretical  background  to  the 


notion  of  degree  of  cyclostationarity  first  introduced  by 
Zivanovic  and  Gardner,  and  a  new  measure  of  cyclosta¬ 
tionarity.  Application  of  this  test  will  appear  in  a  later 
paper. 
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Abstract 

This  note  is  concerned  with  the  problem  of  deter¬ 
mination  of  the  countable  set  A  =  {‘^1,^2,...}  of 
frequencies  belonging  to  an  almost  periodic  sequence 
by  methods  in  which  a  finite  number  of  frequencies 
=  A„  are  are  produced  at  each 
stage  n.  We  seek  algorithms  for  which  A„  converges 
to  A  but  yet  each  A^  is  not  too  big. 


1.  Introduction 

Almost  periodic  sequences  share  many  properties  of 
AP  (almost  periodic)  functions.  [1]  For  example,  the 
Fourier  coefficient 


where  ^  is  not  to  be  taken  as  equality  except  under 
additional  assumptions,  for  example,  when  A  is  finite. 

This  work  is  motivated  by  some  problems  of  esti¬ 
mation  [4,  3]  and  detection  [2]  for  stochastic  processes 
connected  with  almost  periodicity.  Here  we  begin  to 
treat  a  simpler  problem,  the  empirical  determination 
of  the  frequencies  A  of  an  AP  sequence,  to  illustrate 
a  problem  involving  finite  computation.  That  is,  sup¬ 
pose  we  are  given  the  sequence  {x*}  one  element  at  a 
time;  in  other  words,  we  are  given  the  finite  sequences 
Xn  =  {xjb,  ^  =  0, 1,  for  n  =  0, 1, ....  Now  if  we 
are  told,  a  priori,  that  some  A  6  A,  then  from  (1),  the 
sequence 

On  (A)  =  -  ^  exp(-iA^)  (3) 

”  jb=0 


1 

a(A)  =  lim  —  ^  Xk  exp{—iXk)  (1) 

n— 00  n 

exists  for  every  A  and  ParsevaPs  equality 


implies  that  the  set  of  frequencies  A  =  {A  :  a(A)  0} 
belonging  to  the  AP  sequence,  is  countable.  In  the 
case  of  AP  sequences  the  frequencies  may  all  be  taken 
in  the  interval  [0,27r).  The  frequencies  A  and  coeffi¬ 
cients  {a(A),A  G  A}  are  uniquely  determined  by  the 
AP  sequence  X  and  for  this  reason  it  is  said  that  each 
AP  sequence  has  an  associated  (unique)  Fourier  series, 


00 

a{Xj)exp{iXjk)  (2) 

i=-oo 


converges  to  a(A)  ^  0,  and  if  A  ^  A,  then  a„(A) 

0.  But  if  A  is  unknown  a-priori,  we  are  faced  with 
performing  the  limit^  for  an  uncountable  collection  of 
A  to  determine  the  countable  set  A  on  which  a(A)  ^  0. 

So  given  the  practical  constraint  of  finite  computa¬ 
tions,  we  wish  to  determine  A  by  some  sort  of  limit  of 
a  sequence  of  operations,  each  of  which  involves  only  a 
finite  number  of  calculations.  We  address  the  problem 
in  the  following  manner:  at  each  computation  stage  n 
we  compute  a  finite  number  of  frequencies 

...,  =  A„  (4) 

using  only  the  subsequence  Xn-  We  seek  algorithms 
for  determining  A„  from  Xn  that  have  the  following 
properties: 

^In  the  context  where  the  index  k  corresponds  to  time,  we 
must  know  the  sequence  for  infinite  time. 
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1.  A„  converges  to  A  in  the  sense  that  for  every  A  €  A 
there  is  a  sequence  {An}  with  An  G  An  and  An  *■ 

A; 

2.  An  is  not  too  big  in  the  sense  that  convergent  se¬ 

quences  taken  from  the  An  converge  only  to  ele¬ 
ments  of  A;  this  additional  constraint  is  needed 
because  the  finite  set  {2irj'/n,j  =  0,  —  1} 

satisfies  the  property  1.,  but  yet  is  too  big  in  the 
sense  that  some  elements  may  not  be  near  an  ele¬ 
ment  of  A; 

2’.  a  sufficient  condition  for  2.  is  that  there  exists  a 
resolution  function  e(n)  — ►  0  such  that  for  every 
A'  6  An  there  is  a  A  G  A  with 

IV  -  A|  <  €(n).  (5) 


Lemma  1  Fot  the  Bartlett  kernel  IVn(A),  given  6  >  0 
there  exists  a  number  K  >  0  and  an  integer  no  for 


which 


WnjX)  K 

IVn(O)  n2 


(10) 


for  all  |A|  >  6  and  jA  -  2ir\  >  6,  provided  n  >  no- 
Proof.  Choose  no  >  n’/S,  then 


IVn(A)  _  2  siV(nA/2)  ^ 

...  n  • 


lVn(oj  n2  sm2(A/2)  "  n*  sm2(5/2) 


(11) 


and  the  result  follows  by  the  identification  K  — 
2/8inH6/2).  ■ 


2.  A  is  a  singleton 


Our  approach  is  through  the  weighted  Fourier  coef¬ 
ficient  estimator 

(6) 

Jfe=0 

where  is  the  Barileii  weight  sequence  that  has 
considerable  application  in  spectral  density  estimation. 
It  is  given  by 

(n)  _  f  2(1  —  \k/n\)  |I;|  <  n  /yx 

^  \  0  1*1  > «  ^ 

and  in  order  to  keep  the  estimation  procedure  one-sided 
as  in  (6),  we  center  the  sequence  at  n.  The  discrete 
Fourier  transform  of  (7)  is 

n— 1 

IVn(A)  =  5^  Wkexp{-iXk) 

t=-(n-l) 

_  2  sm^(nA/2)  .gx 

n  sin^{X/2) 

and  this  is  related  to  (6)  by 
2n 

exp(iAn)  X;  u'Sn  exp(-.Aib)  =  IVn(A).  (9) 

i;=0 

The  factor  of  2  appears  in  (7)  in  order  to  ensure 
/zm„_ooa“(A)  =  a(A)  and  the  exp(»An)  is  needed  to 
account  for  the  centering  of  the  window  in  [0, 2n]  rather 
than  [-n,  n].  The  fraction  l/2n  appearing  in  (6)  is  used 
in  place  of  l/(2n  +  1)  for  ease  in  computation  and  in 
the  presentation. 

Note  that  1V'„(A)  is  periodic  with  period  2ir, 
tV„(A)  =  1V„(A  -1-  2ir),  and  IV„(0)  =  2n. 

The  fact  1V„(0)  =  2n  and  the  observation  that 
|M^n(x)|  does  not  exceed  2/n  together  motivate  the  fol- 
lowing  lemma. 


In  this  case, 

Xk  =  uo  exp(t  Ao*)  (12) 

and  for  any  A  we  have 

a;r(A)  =  |^lV„(A-Ao).  (13) 

It  may  be  easily  shown  that  IVii(A  —  Aq)  has  a  global 
maximum  at  Aq  and  is  locally  maximum  in  the  neigh¬ 
borhood  of  Ao  given  by  |A  -  Aol  <  ^/n. 

The  following  procedure  meets  the  constraint  of  fi¬ 
nite  computation  at  each  stage  n  and  produces  the  re¬ 
quired  convergence  of  An  to  A,  First,  compute  aJ5J(A) 
for  the  distinct  values  A^"^  =  i2ir/n,  j  =  0, 1,  ...n  -  1 
and  put  a("1  into  A„  provided  it  maximizes  la“(A(”l)l 
for  j  =  0, 1,  1. 

The  localization  properties  of  the  function  W^n(A) 
show  that  each  set  An  will  contain  at  most  two  el¬ 
ements,  the  two  A^”^  surrounding  Aq.  That  is,  if 

A(”^  <  Ao  <  a(”].i,  then  one  or  both  of  a("\a("].i 
will  be  elements  of  An  and  so  every  element  A^”^  €  An 
satisfies  |A(”)  —  Ao|  <  -. 

3.  A  is  finite 

When  A  is  finite,  say  card(A)  =  J , 

J 

Xfc  =  ^  aj  exp(»Aj  k)  (14) 

i=i 

and  note  that  there  exists  a  5  >  0  for  which  |A^-  —  Ajv|  > 
6  for  all  i,i'  with  j  ^ 

We  will  choose  the  elements  of  An  to  be  the  strong 
local  peaks  among  the  finite  collection  of  aJJ  (A)  com¬ 
puted  at  each  stage. 
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To  complete  the  argument,  first  take  n  sufficiently 
large,  say  n  >  no,  so  that  >  10.  For  any  fixed 

n  >  no,  let 

^•"^  =  arg  max  |a“(A5"^)|. 

A^”>6(A'.A'') 

But  for  there  to  be  a  strong  local  peak  at  A^"^)  also 
requires 

i<(a5:^)i  >  loia-cA;."))! 

for  3C  <  \j*  —  j\<  4C,  which  thus  requires 

>10 

From  the  preceeding  discussion,  for  sufficiently  large  n, 
this  ratio  must  converge  to  1  and  hence  will  not  exceed, 
say  2.  It  may  be  seen  that  for  sufficiently  large  n,  no 
G  (A',  A")  will  be  selected  as  a  strong  local  peak. 
Finally  we  note  again  that  we  do  not  know  A  or  the 
numbers  {a(A),  A  G  A}  a-priori  and  so  the  value  of  no 
for  which  n  >  no  produces  these  results  is  not  known 
to  us.  One  cannot  say  when  these  events  occur,  only 
that  they  will. 

4*  A  has  isolated  cluster  points 

Here  we  consider  the  class  of  AP  sequences  for  which 
^  Suppose  Ao  is  the  cluster  point  of  A 
and  ao  =  a(Ao).  We  shall  assume  Ao  is  an  interior  point 
of  [0, 27r),  and  omit  the  adjustments  for  the  case  when 
Ao  =  0.  For  any  e  >  0  there  is  a  deleted  neighborhood 
-0(Ao,  S)  =  (Aq  —  5,  Ao  +  ^)\{Ao}  of  Ao  for  which 

lajl<e/2. 

Aj6-0(Ao,5) 

Take  e  =  |ao|/10.  The  examination  of  a}5'(A)  at  the 
point  Ao  will  help  us  understand  the  new  situation. 
Consider  then 

aH'CAo)  =  ao  +  ^  ajfV„(Ao  -  Aj)+ 

XjeD(Xo,s) 

^  E  (22) 

Because  the  last  sum  is  finite  we  know  from  the  pre¬ 
vious  section  it  is  0{-^)  so  there  is  an  no  for  which 
n  >  no  implies  this  quantity  will  not  exceed  e/2.  As 
for  the  middle  term, 

1^  E  «;W^n(Ao-A,)| 

Ay6i?(Ao,^) 


<  E  KI|W^n(^0-A,)| 

Xj€D(Xo,6) 

<  E  l“ii<f/2  (23) 

Aj6^(Ao,^) 

Of  course  we  already  know  that  aJJ  (Aq)  — ►  a(Ao)  but 
this  permits  us  to  see  what  happens  when  we  evaluate 
aJ5'(A)  at  the  nearest  point  Aj”^*  to  Aq.  For  n  suffi¬ 
ciently  large,  the  points  not  in  Z)(Ao, 6)  will  contribute 
to  a-CAj”)*)  as  described  in  the  finite  case.  The  mid¬ 
dle  term  contributes  at  most  e/2  (or  |ao|/10)  and  so 
will  eventually  dominate  the  rest  (in  this  ex¬ 
ample,  by  a  factor  of  10). 

The  demonstration  that  A„  is  not  too  big  is  in 
progress;  we  expect  to  follow  as  in  the  finite  case.  The 
idea  is  to  show  that  any  interval  (A',  A")  containing  no 
points  of  A,  aJ5'(A)  will  ultimately  not  have  any  strong 
local  peaks.  Even  though  A  is  infinite,  there  are  only 
a  finite  number  of  frequencies  outside  a  neighborhood 
of  Ao,  and  those  inside  have  summable  amplitudes,  so 
their  contribution  to  a^(A)  for  A  G  (A',  A")  will  even¬ 
tually  become  negligible. 
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To  be  more  specific,  we  compute  aJJ (A)  for  the  values 
A^."^  =  il-KiCn,  i  =  0, 1,  ...Cn  - 1  where  C  is  a  positive 
integer.  For  some  arbitrary  Xj„  €  A  denote  as  the 
A^"^  closest  to  Ajo  and  consider  the  expression 

i=i 

3^30 


If  the  Aj"^  are  sufficiently  dense  with  respect  to  the 
kernel  W(-),  then  the  first  term  on  the  right  is  close  to 
ajo.  say 

|^a,oIF„(A;.;;.)  -  A,-J|  >  .9|ajo|  (16) 

and  the  second  term  is  C>(;^)  because  all  the  remaining 
Xj  are  “far”  from  We  are  lead  to  conclude  that 

there  exists  an  interval  In{jo,ji)  between  Xj„  and  the 
next  largest  Xj  (call  it  AjJ  for  which 


a 


w 

n 


(A<")) 


- 

0{ajo)  +  0(;^) 


(17) 


for  all  a("^)  €  /„(io,ii)-  In  other  words,  for  sufficiently 
large  n,  the  value  of  la;?(A(j^Jl  will  begin  to  dominate, 
as  n^,  the  values  taken  in  a  nearby  interval.  Imme¬ 
diately  neighboring  values  may  have  to  be  excluded 
because  of  the  shape  of  W{-)» 

A  frequency  index  j*  is  said  to  produce  a  strong  local 
peak  with  parameters  Ki,K2i  K3  if 


satisfies  -  AjJ  <  2n/Cn.  Hence  for  each  A;„  G  A 
one  may  find  a  sequence  {a("^}  with  a(”^  G  A„  and 

Now  we  must  show  that  the  are  not  too  big.  We 
will  show  that  in  any  open  interval  (A',A'^)  contain¬ 
ing  no  points  of  A,  aJJ  (A)  will  ultimately  not  have  any 
strong  local  peaks. 

Consider  the  expression 


(18) 


j'=i 


for  A  in  (A',  A").  Although  the  values  |a“(A)|  can  vary 
significantly  for  A  G  (A',  A")  ,  we  will  now  show  that 


lim 

n— >00 


|an(A  +  2Vn)|  ^ 

l<(A)l 


(19) 


for  A  in  this  interval.  Clearly  A  +  27r/n  — >  A  as  n  — >  oo. 
The  ratio  in  (19)  may  be  expressed  as 


AW(A+2T/n) 
2-.;'=l  B,-,(A+2ir/n) 


(20) 


where  A(?^(A)  =  aj/sin^[n(A  —  Aj<)/2]  and  Bji{X)  — 

sm2[(A- Aj-)/2].  Because  A5.?^(A  +  27r/n)  =  ^^"^(A)  is 
bounded  with  respect  to  n,  the  following  Lemma  may 
be  applied. 

Lemma  2  =  1,2,...,  J  are  each  bounded 

sequences  and  ,i  =  1,2,...,  J  are  each  con¬ 
vergent  sequences  with  — >  I3j,  then 


i<(a;?))i  >  i<(a;"))i 

for  |j*  —  j\<  Ki,  and 

>  i^3|a;^(A;."))| 

for  K2  <  \j*  -  j\  <  Ki.  So  a  strong  local  peak  is 
at  least  K3  times  larger  than  it’s  neighbors  except  for 
those  nearby  (|j*  —  i|  <  K2).  The  elements  of  An 
are  the  frequencies  Xj.  associated  with  the  strong  local 
peaks. 

In  general,  the  value  of  K3  is  to  be  considered  large 
(e.g.  K3  =  10  or  K3  =  100),  and  the  values  of  Ki  and 
K2  will  depend  on  the  Fourier  transform  of  the  weight 
sequence  Wk^ 

For  some  arbitrary  Ajo  6  A,  if  we  take  C*  =  4,  = 

16,  K2  =  12  and  K3  =  10  with  the  Bartlett  sequence, 
then  for  n  sufficiently  large  we  will  obtain  a  A^  that 


Itmn^oo  J  (n)i-rJ  o 


(21) 


The  proof  follows  from  setting  where 


flp)  ^  0  for  all  j. 

In  our  current  problem  we  set  =  Aj”^(A), 

zz  J5j(A-l-27r/n)  and  /?j  =  Bj(A),  and  we  note  that 

because  Bj  (A)  are  continuous  functions  of 
A.  Thus  by  application  of  the  Lemma  we  obtain  (19). 
Furthermore,  considered  as  functions  of  A,  the  conver¬ 
gence  is  uniform  because  the  a’s  are  uniformly  bounded 
and  the  /3’s  come  from  shifts  of  the  single  continuous 
function  l/8in^{x)  for  x  bounded  away  from  0  and  27r. 
And  all  this  is  also  true  when  A  +  27r/n  is  replaced  with 
A  ±  K2Tr/n  for  finite  (fixed)  K, 
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Abstract 

An  important  feature  available  in  certain  scenarios  for 
the  underwater  sonar  detection  of  broadband  signals  is 
the  formation  of  striated''  patterns  in  spectrograms.  The 
observed  striations  can  be  modeled  by  a  broadband  mul¬ 
tipath  signal  whose  time  delay  between  arrivals  is  slowly 
varying  linearly.  This  model  leads  to  generalized  notion 
of  cyclostationarity  where  the  cyclic  frequency  varies  lin¬ 
ear  with  frequency.  In  this  paper,  noncoherent  and  co¬ 
herent  methods  to  detect  these  broadband  energy  patterns 
based  on  this  model  are  presented  and  demonstrated  on 
a  nontrivial  example.  While  noncoherent  methods  cannot 
distinguish  between  the  positive  and  negative  delay  rates, 
coherent  methods  determine  both  the  sign  in  addition  to 
an  estimate  of  the  initial  multipath  delay. 


1.  INTRODUCTION 


Very  low  frequency  (VLF)  underwater  signals  can  be 
exploited  at  long  ranges  because  of  their  propagation  char¬ 
acteristics.  In  certain  scenerios,  an  important  characteris¬ 
tic  of  VLF  broadband  energy  propagation  is  the  formation 
of  ’’striated”  patterns  in  the  time-frequency  domain  (i.e., 
spectrograms)  caused  by  multipath  interference.  This  phe¬ 
nomenon  is  sometimes  referred  to  as  ’’Lloyd’s  Mirror”  by 
the  sonar  community.  In  [3],  acoustic  propagation  models 
were  demonstrated  to  predict  these  VLF  multipath  inter¬ 
ference  patterns  and  a  variety  of  detection  methods  were 
developed.  In  this  paper,  methods  to  detect  these  broad¬ 
band  energy  patterns  based  on  a  generalized  notion  of  cy¬ 
clostationarity  are  presented  and  demonstrated  on  realistic 
simulated  data. 


2.  Affine  Time  Delay  Multipath  Model 

The  observed  striation  patterns  can  be  modeled  by  a 
broadband  multipath  signal  whose  time  delay  between  ar¬ 
rivals  is  varying  linearly.  This  linear  relationship  could 
perhaps  arise  from  a  variety  of  environmental  factors,  but 
in  practice  the  cause  is  the  motion  of  the  source.  In  a 
multipath  environment,  the  received  signal  x{t)  in  the  time 
domain  can  be  written  as 

xit)=:  s{t)+-ys{t-T{t))+n(t),  0<t<T  (1) 

where  s{t)  is  the  direct  path  stationary  signal  s{t  -  r{t)) 
is  a  delayed  version  of  the  signal  from  an  alternative  path 
(possibly  with  a  different  amplitude),  and  n(t)  is  a  station¬ 
ary  process  representing  the  aggregate  effects  of  all  noise 
factors. 

A  first  order  approximation  to  the  delay  function  r  is 
linear  so  that  r(t)  =  at  -t-  jd.  The  delay  is  assumed  to 
be  slowly  varying  so  that  the  delay  rate  a  is  small.  In 
order  to  distinguish  the  two  paths  in  the  model,  the  initial 
multipath  delay  j3  is  assumed  to  be  strictly  positive.  With 
this  simplification,  (1)  becomes 

x{t)  =  s{t)^'^s{t-at- l3)+n{t),  0<t<T  (2) 

which  will  be  referred  to  as  the  affine  time  delay  model. 
In  [2],  several  applications  are  discussed  where  the  affine 
time  delay  model  has  been  used  to  model  the  propagation 
effects  on  signals  from  sources  in  motion. 

We  assume  the  delay  rate  is  sufficiently  small  that  the 
signal  is  approximately  stationary  over  short  time  windows. 
In  particular,  the  expected  amount  of  energy  at  frequency 
w  at  time  t,  denoted  by  P{t,u)),  is  well-approximated  by 

=  (l  +  7^+27COs(wQt+w/3))5(w)-t-iV(w)  (3) 

where  S{w)  and  iV(w)  are  the  power  spectral  densities  of 
the  signal  and  noise,  respectively.  For  simplicity,  this  can 
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be  written  alternatively  as 

P{t,iv)  =  A{iv)  B{<jj)cos{u)at  +  u)l3)  (4) 

where  A  (cj)  represents  the  energy  of  the  agregate  stationary 
components  and  B  {to)  represents  the  peak  energy  of  the 
nonstationary  or  striating  components. 

In  our  discussion  P{t,  (j)  is  the  spectrogram,  that  is,  the 
short-time  Fourier  transform  (STFT).  In  principle,  however, 
P  could  be  any  time-frequency  distribution  and  could  be 
optimally  matched  to  this  model  (as  is  discussed  in  the 
conclusions  section).  This  spectrogram  model  contains  in¬ 
terference  patterns  with  peaks  along  the  following  family 
of  hyperbolic  curves 

wat cjjS  —  k,  /c  =  0,  ibl,  dz2, . . .  (5) 

as  is  shown  in  the  example  in  Figure  1  with  Nyquist  fre¬ 
quency  normalized  to  1. 


Frequency 

Figure  1.  Spectrogram  for  Affine  Multipath 
Delay  Model  with  White  Signals  and  Noise 

The  signal  s{t)  in  this  example  is  band-limited  white 
noise  with  a  cutoff  frequency  equal  to  Nyquist.  The  de¬ 
layed  signal  is  obtained  by  an  initial  interpolation  using  the 
MATLAB  function  INTER?  with  an  oversampling  factor 
of  4  and  additional  resolution  using  linear  interpolation. 
The  results  do  not  change  significantly  using  higher  initial 
oversampling  factors.  The  noise  n{t)  is  white  and  the  SNR 
is  OdB,  i.e.,  the  noise  energy  equals  that  of  s{t).  The  re¬ 
ceived  signal  time  series  x{t)  consists  of  TV  =  2^®  =  32768 
samples  over  T  =  1  second.  The  spectrogram  is  calcu¬ 
lated  with  128  nonoverlapping  FFT’s  of  length  256  re¬ 
sulting  in  a  128  x  128  image.  The  multipath  parameters 


are  7  =  1,q  =  —8  samples/second  and  /?  =  32  sam¬ 
ples.  So  the  multipath  delay  varies  from  an  initial  value 
of  t(0)  =  (3  =  32  samples  to  a  final  value  of  t(1)  =  24 
samples  over  the  second  interval.  Note  in  Figure  1  that 
there  are  t(0)/2  =  16  peaks  at  time  0,  r(l)/2  =  12  peaks 
at  time  1,  and  |q1/2  =  4  peaks  for  frequency  1. 

3.  Generalized  Cyclostationary  Approaches 

Although  the  process  x{t)  is  nonstationary,  it  does  have 
some  stationarity  properties  that  allow  the  application  of 
traditional  signal  processing  methods.  For  each  fixed  fre¬ 
quency  w,  consider  the  marginal  process  Q{t)  =  Qu,(t)  = 
P{t,oj).  This  process  is  stationary  with  significant  spec¬ 
tral  energy  at  cyclic  frequency  X  =  aw.  Moreover,  the 
cyclic  phase  of  Q  is  /3w.  Because  the  cyclic  frequency 
(or  one  over  the  cyclic  period)  is  not  constant,  but  varies 
with  frequency  (in  this  case  linearly),  x{t)  can  be  called  a 
generalized  cyclostationary  process. 

3.1.  Noncoherent  Detector 

A  simple  noncoherent  detector  of  the  delay  rate  can 
be  constructed  based  on  these  generalized  cyclostationary 
properties.  The  first  step  is  to  estimate  the  stationary  com¬ 
ponent  A(a;)  in  (4)  and  subtract  it  out  of  each  frequency 
column.  Prewhitening  of  the  received  signal  can  be  impor¬ 
tant  in  practice,  but  in  our  example  we  assume  the  signal  is 
already  white  for  the  sake  of  simplicity.  The  power  spec¬ 
tral  density  can  estimated  for  each  fixed  frequency  giving 
a  two  dimensional  function  of  frequency  w  and  cyclic  fre¬ 
quency  A.  Energy  integrated  over  frequency  and  the  cyclic 
frequency  bins  corresponding  to  A  =  aw  (i.e.,  lines  through 
the  origin)  provide  a  measure  of  the  marginal  likelihood  for 
a  given  delay  rate.  These  steps  are  simplified  notationally 
by  first  defining  the  cyclostationary  gram  C{a,u))  by 

C(a,w)=  re-“‘"‘P(t,w)dt  (6) 

Jo 

which  effectively  accounts  for  the  linear  relationship  be¬ 
tween  cyclic  frequency  and  frequency  (i.e.,  concentrates 
all  the  energy  along  a  vertical  line).  Figure  2  shows  the 
cyclostationary  gram  for  our  example.  The  noncoherent 
delay  rate  likelihood  function  L{a)  can  now  be  written  as 


which  is  shown  in  Figure  3. 

3.2.  Two-parameter  Coherent  Detector 

Note  that  L{a)  is  symmetric  about  0.  This  points  out 
one  inadequacy  of  this  approach,  that  is,  only  absolute 
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Normalized  Frequency 


Cydostationary  Gram 


Figure  2.  Cydostationary  gram  taking  into  ac¬ 
count  linear  dependency  of  cyclic  frequency  on 
frequency. 


Noncoherent  Detector 


Figure  3.  Noncoherent  likelihood  function  for 
multipath  delay  rate  parameter  a. 


value  of  the  delay  rate  is  being  detected  and  there  is  no 
discriminatin  between  sources  that  are  opening  and  clos¬ 
ing  the  receiver  at  the  same  speed.  We  therefore  consider 
a  coherent  detection  approach  with  a  two-parameter  joint 
likelihood  function  L(q,/3)  which  incorporates  the  cyclic 
phase 


L{a,/3) 


(8) 


which  can  be  interpreted  as  the  Fourier  transform  of  the 
coulumns  of  the  complex  cydostationary  gram.  The  joint 
likelihood  for  our  example  is  shown  in  the  right  image  of 
Figure  4.  The  peak  is  clearly  evident  at  (—8, 32)  corre¬ 
sponding  to  the  corect  values  of  our  model. 


Joint  Likelihood 


Figure  4.  Joint  two-parameter  iikeiihood  func¬ 
tion  for  a  and  /?. 

A  coherent  marginal  likelihood  function  for  the  delay 
rate  can  be  found  by  integrating  out  the  0  parameter  and 
is  shown  in  Figure  5.  The  coherent  detector  is  now  able 
to  distinguish  between  positive  and  negative  values  of  tbe 
delay  rate. 

4.  Conclusions  and  Future  Directions 

Coherent  and  noncoherent  methods  based  on  a  general¬ 
ized  notion  of  cyclostationarity  can  be  constructed  to  detect 
broadband  multipath  signals  with  affine  time  delay.  A  re¬ 
sult  demonstrated  in  this  paper  (and  widely  known  in  radar 
and  other  similar  applications)  is  that  while  noncoherent 
methods  cannot  distinguish  between  the  positive  and  nega- 
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Coherent  Detector 


Figure  5.  Coherent  detector  obtained  by  inte- 
grating  joint  likelihood  function  over  /?. 


tive  delay  rates,  coherent  methods  determine  both  the  sign 
in  addition  to  an  estimate  of  the  initial  multipath  delay. 

Several  possible  research  directions  for  this  model  and 
these  detection  methods  are  apparent.  The  affine  time  de¬ 
lay  model  can  be  generalized  by  letting  r  be  an  arbitrary 
slowly  varying  function  (i.e.,  small  r').  The  use  of  alterna¬ 
tive  time-frequency  distributions,  especially  ones  matched 
to  the  hyperbolic  patterns  (see  [4]  and  [1])  are  worth  in¬ 
vestigating.  The  detection  performance  of  these  methods 
also  need  to  be  assessed  for  varying  parameters(.e.g.,  A) 
and  multiple  signals  with  possibly  more  than  two  paths. 
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ABSTRACT 

Periodogram  is  a  useful  tool  to  reveal  hidden  periodici¬ 
ties  in  a  given  time  series  but  the  resulting  spectral  lines 
have  often  been  associated  with  constant  amplitude  har¬ 
monics.  Possibilities  exist  where  the  harmonics  actually 
have  non-zero  mean  random  (as  opposed  to  constant)  am¬ 
plitudes  because  the  two  can  have  identical  periodograms. 
Applications  exist  to  support  the  random  amplitude  mod¬ 
els.  Cyclic  statistics  are  employed  here  as  effective  tools 
to  distinguish  constant  from  random  amplitude  harmonic 
models.  The  algorithms  are  FFT  based  and  are  easy  to 
implement  as  illustrated  by  numerical  examples. 


1.  INTRODUCTION 

Detection  of  hidden  periodicities  embedded  in  a  random 
process  has  been  a  concern  over  100  years.  Schuster  in 
1894  devised  the  periodogram  as  a  means  of  searching  for 
hidden  periodicities.  It  has  had  much  success  in  many  ar¬ 
eas  ranging  from  seasonal  and  economic  time  series,  seis¬ 
mology,  geophysics,  spectroscopy,  and  communications  to 
sonar  and  radar  signal  processing  (see  e.g.,  [2],  [1],  [5]  and 
references  therein).  In  this  paper,  we  consider  a  single  har¬ 
monic  which  may  be  “hidden”  in  a  discrete-time 

series  {ic(f)}^Q^  The  periodogram  of  x{t)  is  defined  as 


(1) 

T-1 

Xria)  =  y]x(i) 

<=0 

(2) 

where  (2)  is  simply  the  DFT  of  the  data  {x{t)}J~Q. 

If  the  periodogram  shows  a  peak  at  cuo,  one  tends  to  be¬ 
lieve  that  x(t)  is  of  the  form 

=  (3) 

where  A,  ujq^  4>o  are  deterministic  constants  and  u(i)  is  sta¬ 
tionary  additive  noise.  On  the  other  hand,  if  (1)  does  not 
show  any  peak,  one  is  tempted  to  say  that  x(t)  is  stationary. 

These  are  the  pitfalls  that  researchers  are  easily  subject 
to,  and  they  are  the  interest  of  this  paper.  Our  purpose 
here  is  to  clarify  that  (i)  when  the  periodogram  exhibits  a 
peak  at  a;o,  x(t)  can  also  be  of  the  form 

x(t)  =  s(t)  (4) 

where  s{t)  is  an  ergodic  random  process  with  mean  rris  = 
F?[s(t)]  /  0  and  is  assumed  to  be  uncorrelated  with  v(t). 
We  refer  to  (3)  as  the  constant  amplitude  harmonic  model. 


and  (4)  as  the  random  amplitude  harmonic  model.  Alterna¬ 
tively,  we  will  also  call  (4)  a  harmonic  in  multiplicative  and 
additive  noise.  Note  that  (3)  can  be  regarded  as  a  special 
case  of  (4)  with  s{t)  =  A. 

Another  point  that  we  wish  to  clarify  is  (ii)  when  the 
periodogram  does  not  show  any  peak,  it  is  still  possible  for 
x{t)  to  obey  (4)  but  with  rris  =  0.  The  goal  of  this  paper 
is  to  provide  tools  that  can  distinguish  stationary  processes 
such  ^  v{t),  constant  amplitude  harmonics  (3),  and  random 
amplitude  harmonics  (4),  using  cyclic  statistics. 

Random  amplitude  harmonics  show  up  in  a  variety  of 
applications.  In  radar  processing,  when  a  non-point  tar¬ 
get  is  fast  maneuvering  or  scintillating,  the  resulting  har¬ 
monic  (due  to  Doppler  shift)  carries  a  random  amplitude 
[6].  In  underwater  acoustics,  when  the  medium  (the  ocean) 
is  dispersive  or  fluctuating,  the  sonar  return  also  experi¬ 
ences  some  random  amplitude  effect  [3].  The  model  in  (4) 
is  also  appropriate  for  Doppler  weather  radar/lidar  returns, 
where  s{t)  is  due  to  the  randomness  in  the  scatterers  (hydro¬ 
meteors  or  areasol  particles).  Due  to  carrier  modulation,  (4) 
is  suitable  for  communications  signals  as  well. 

We  wish  to  point  out  that  it  is  important  to  identify 
the  correct  model  at  least  for  the  following  reasons:  1) 
Whether  the  harmonic  has  random  or  constant  amplitude 
reveals  partial  information  about  the  source  (target)  such 
as  scattering  or  fading;  2)  The  Cramer-Rao  bounds  on  the 
parameter  estimates  are  different  for  the  two  models  [7];  3) 
The  corresponding  maximum  likelihood  (ML)  estimates  are 
also  different.  For  example,  in  the  nonzero  mean  (m^  /  0) 
case  and  when  i;(t)  is  zero-mean  white  Gaussian,  estimates 

ms  (or  A),  uqj  obtained  by  minimizing  the  mean  square 
error  between  x{t)  and  are  ML  when  s{t)  =  A 

but  are  not  when  s(t)  is  random  [8].  Therefore  by  assuming 
the  wrong  model,  one  may  intend  to  obtain  ML  estimates 
but  cannot. 

We  will  show  that  (3)  and  (4)  can  have  identical  peri¬ 
odograms  when  ms  ^  0.  However,  when  a  peak  is  detected 
in  the  periodogram,  one  easily  tends  to  believe  that  the 
true  model  is  (3),  and  the  possibility  of  (4)  being  present  is 
usually  overlooked. 

The  rest  of  the  paper  is  organized  as  follows:  in  Section 
2  we  examine  the  cyclic  statistics  of  (3)  and  (4)  and  devise 
algorithms  to  distinguish  the  two.  Some  practical  aspects 
of  the  algorithms  are  discussed  in  Section  3.  We  use  sim¬ 
ulated  data  to  illustrate  the  procedures  in  Section  4  and 
draw  conclusions  in  Section  5. 

2.  RETRIEVAL  USING  CYCLIC  STATISTICS 

The  processes  in  (3)  and  (4)  are  called  wide-sense  cyclosta¬ 
tionary  because  their  mean  or  variance  are  periodic  func¬ 
tions  of  time.  The  mean  of  (3)  is  given  by 

mi4t)  =  £:[x(0]  =  A  (5) 
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whereas  the  mean  of  (4)  is 

=  (6) 

with  =  £;[t^(t)]-  If  wo  /  0  mod  (27r),  we  realize  from 
both  (5)  and  (6)  that  limr-^oo 

hence  one  can  always  remove  the  time  average  of  x{t)  to 
equivalently  remove  the  mean  of  v{t).  W.l.o.g,  we  hence¬ 
forth  assume  that  Tn„  =  0  and  rewrite 


mix(t)  =  ^e^'^“°*+^°^  for  (3), 

(7) 

mix(t)  =  m.  for  (4). 

(8) 

Since  (7)  and  (8)  are  periodic  functions  of  t,  we  consider 
their  Fourier  Series  (FS)  coefficients,  which  we  call  the  cyclic 
mean  [4], 

T-l 

Mii(a)  =  lim  iy^mix(*) 

T-i-oo  1  ' 

t=0 

(9) 

For  (7)  and  (8),  they  are  given  by 

Mix  (a)  =  AeP'*’°  5{a-u}o), 

(10) 

Mix(a)  =  ms  e?^°  5{a-u}o), 

(11) 

respectively,  where  <5(’)  is  the  Kronecker  delta  function. 
Consistent  sample  estimate  of  Mix  (a)  is  given  by  [4] 


Mix(a)^i^x(t)e-^“*.  (12) 

t=0 

which  is  simply  the  normalized  DFT  of  the  data. 

From  (10)  and  (ll)j  we  see  that  if  Tfis  ^  0,  a  plot  of 

lMix(a)|  will  show  a  peak  at  a  =  u;o  for  both  (3)  and 
(4),  the  location  of  which  provides  an  estimate  of  uq.  The 

phase  at  the  peak,  arg[Mix(t^o)],  gives  and  the  peak 

strength  \Mix{<^o)\  yields  an  estimate  for  A  or  m^.  We 
proved  in  [8]  that  these  estimates  are  consistent  with  the 
following  variance  rates:  var((I>o)  =  0{T  ^),  var(ms)  = 

0(T”^),  var(A)  =  and  vax(0o)  = 

It  is  easy  to  see  that  the  periodogram  in  (1)  is  related 

to  (12)  as  follows:  J2x{ot)  =  T  |Mix(a)|^  Hence,  a  peak 
in  |Mix(c^)l  is  equivalent  to  a  peak  in  l2x{oi)  at  the  same 
location.  In  this  sense,  cyclic  mean  and  periodogram  ^e 
equivalent.  However,  the  latter  does  not  contain  phase  in¬ 
formation.  r  u  i-u 

Because  when  ms  #  0,  cyclic  mean  peaks  at  uo  for  both 
(3)  and  (4),  the  two  models  cannot  be  distinguished.  How¬ 
ever,  their  respective  variance  tells  the  difference:  (Jx{^) 
=  aly  for  (3),  and  (Tl{t)  =  af  exp{2j(a;of+(^o)}  +0“^,  for  (4), 
where  denotes  the  variance  of  u(t)  and  similarly  for 
We  term  the  FS  coefficient  of  o-^(f)  as  the  cyclic  variance  of 
x(t)  and  it  is  given  by 

C2x  (a)  =  Uni  i  V  al  {t)  (13) 

t=0 

which  equals  al  S{a)  for  (3),  and  S{a  -  2ujo) 

-ho-J  S{a)  for  (4).  It  is  this  quantity  that  reveals  the  differ¬ 
ence  between  the  two  models  (3)  and  (4):  a  peak  at  a  ^  0 
hints  towards  the  random  amplitude  model  (4). 


Note  that  we  can  also  use  the  cyclic  covariance  of  x{t), 
defined  as  the  FS  coefficient  of  cov{x(f),  x{t  -1-  r)}  w.r.t.  f, 
at  lag  T  ^  0,  to  achieve  similar  results.  But  cyclic  variance 
is  slightly  easier  to  implement. 

Under  (4)  and  for  (13),  the  peak  at  a  =  2ujo  relies  on 

^5  ^  0  to  be  visible.  We  always  have  cTs  >  0  when  s{t) 
is  a  real  random  process.  However,  when  s{t)  is  complex, 
cr?  =  0  may  happen  -  this  is  the  c^e  for  QAM  processes 
for  example.  The  fourth-order  cyclic  statistic  proposed  in 
[4]  resolves  the  problem. 

Sample  estimate  of  (13)  is  given  by 

&•(<.)  =  <«) 

t=0 

which  is  the  normalized  DFT  of  the  squ^e  of  the  me^- 
compensated  process.  For  constant  amplitude  harmonics, 
we  simply  replace  the  above  formula  by  ms  =  A. 

Therefore  we  obtain  the  first  result:  If  the  cyclic  mean 
of  x(t),  or  the  periodogram  of  x{t),  shows  a  peak  at  wo, 
we  need  to  further  compute  the  cyclic  variance  of  x(t),  or 

the  periodogram  of  [x(t)  -  in  order  to  dis¬ 

tinguish  (3)  and  (4).  If  the  resulting  quantity  shows  a  peak 
at  2a;o,  then  the  model  is  (4);  otherwise,  (3)  is  in  force. 

Now  let  us  see  what  happens  if  Mix  (qj)  does  not  show  any 
peak  at  all.  This  implies  that  the  possibility  for  the  con¬ 
stant  amplitude  harmonic  model  (31  is  ruled  out.  Our  task 
here  becomes  deciding  whether  x(t)  is  a  purely  stationary 
process  or  model  (4)  with  ms  =0. 

To  resolve  this  problem,  we  again  compute  the  cyclic  vari¬ 
ance.  The  cyclic  variance  of  a  stationary  process  shows  a 
single  peak  at  a  =  0,  whereas  (4)  shows  an  additional  peak 
at  a  =  2a;o.  The  following  observation  is  made:  If  the  cyclic 
mean  of  x(Q,  or  the  periodogram  of  x{t),  shows  no  peak, 
then  we  rule  out  the  possibility  of  (3).  We  then  compute 
the  cyclic  variance  of  x(Q,  or  the  periodo^am  of  x^(t).  If 
the  latter  shows  a  peak  at  a  /  0,  then  x{t)  is  due  to  (4). 

We  want  to  point  out  however,  that  it  is  possible  to  design 
a  rigorous  statistical  test  to  decide  on  the  peaks,  and  this 
constitutes  an  interesting  future  research  direction. 

When  there  are  multiple  harmonics  present  and  we  wish 

to  decide  between  the  models  x{t)  = 

and  x{t)  =  where 

Ai,u}i,(j>i  are  deterministic  constants  and  {si{t)}i^i  U{v{t)} 
are  mutually  uncorrelated,  the  cyclic  mean  exhibits  peaks  at 
for  both  models  if  E[si{t)]  #  0.  However,  the  cyclic 
variance  only  peaks  at  a  =  0  for  the  constant  amplitude 
model  but  shows  peaks  at  {0}  U  {2uji}{Li  for  the  random 
amplitude  model  instead.  Hence  the  cyclic  algorithms  of 
this  section  still  apply  for  multicomponent  harmonics. 

3.  PRACTICAL  CONSIDERATIONS 

When  x{t)  is  zero-mean,  the  cyclic  variance  can  be  eas¬ 
ily  computed  as  the  normalized  DFT  of  the  data  squared. 
However,  the  nonzero-mean  case  is  more  interesting  because 
this  is  where  one  could  be  confused  between  random  and 
constant  amplitude  harmonic  models.  It  is  also  more  cum¬ 
bersome  since  one  has  to  first  estimate  the  time  varying 
mean  mix(Q,  remove  it,  and  then  compute  the  cyclic  vari¬ 
ance.  Estimates  rhs  (or  A),  wo  can  all  be  computed 

from  Mix  (a)  to  form  7hix(t).  But  if  the  data  length  is 
short,  these  estimates  may  not  be  very  accurate.  As  a  le- 
sult,  spurious  peaks  may  occur  in  the  cyclic  variance  of  (3) 
at  a  ^  0  due  to  the  “residue”  harmonic  not  completely  re¬ 
moved.  This  may  hamper  the  performance  of  the  detection 
scheme  based  on  the  cyclic  variance. 
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The  following  alternative  can  be  considered  which  avoids 
the  aforementioned  problem  and  requires  somewhat  less 
computation.  We  first  compute  the  sample  cyclic  mean  and 
if  we  detect  a  peak  at  a  ^  0,  we  record  that  peak  strength 
and  denote  it  as  ms.  Next,  instead  of  the  variance,  we 
consider  the  mean  square  of  x(t)^ 

m2x{t)  =  E[x^{t)]  =  m2.  +  al,  (15) 

where  m2.  =  =  crl  +  m^  When  s{t)  =  A,  we 

simply  replace  m2s  by  A^. 

Since  (15)  is  a  periodic  function  of  t,  we  consider  its  FS 
coefficient  which  we  term  the  cyclic  mean  square, 

M2x(q)  =  lim  -  (16) 

=  m2,  5(q  -  2wo)  +  al  S{a),  (17) 

whose  consistent  sample  estimate  is  given  by 

T-l 

M2x{a)  =  -Y^x\t)e-^-\  (18) 

t=0 

Now  for  both  (3)  and  (4)  with  nis  #  0,  (18)  will  show 
peaks  at  a  =  0  and  a  /  0.  Denote  the  peak  strength  at 

a  /  0  as  rh23  and  compute  af  =  77125  -  m^  If  al  is  close 
to  zero,  we  decide  that  (3)  is  more  appropriate;  otherwise, 
we  choose  (4).  The  rationale  is  that  al  =  7n2s  -  ml  is 
nonzero  for  s{t)  random,  and  al  =  0  for  s{t)  =  A.  Of 
course,  an  interesting  research  problem  here  is  to  develop 
a  statistical  test  in  order  to  decide  on  the  zeroness  of  the 
random  variable  al. 

We  wish  to  point  out  that  the  value  of  al,  obtained  either 
from  7n2s  —  ml  using  the  cyclic  mean  and  mean  square,  or 
from  the  peak  strength  of  the  cyclic  variance  at  a  =  2wo, 
may  be  used  as  a  measure  of  dispersion  or  fading  in  Doppler 
radar  or  sonar  applications. 

The  advantage  of  the  cyclic  mean  square  approach  is  that 
one  can  avoid  estimating  cuo  and  0o-  However,  the  difference 
between  the  models  (3)  and  (4)  is  revealed  more  numerically 
than  graphically. 

4.  SIMULATIONS 

We  illustrate  the  algorithms  proposed  in  this  paper  using 
simulated  data.  The  following  specifications  apply  to  all 
examples:  T  =  512,  u;o  =  1,  0o  =  0.6.  In  addition,  additive 
noise  v(t)  is  a  zero-mean  uniformly  distributed  process  with 
variance  al  =  0.5. 

Example  1:  Consider  a:i(t)  which  is  given  by  (4)  with  i.i.d. 
Gaussian  s(<)  having  rus  =  1.2  and  al  =  0.4,  and  X2{t) 
which  is  given  by  (3)  with  A  =  1.2.  The  real  parts  of  xi(t) 
and  X2(t)  are  shown  in  Figs,  la  and  lb,  and  the  sample 
cyclic  means  are  shown  in  Figs.  Ic  and  Id,  respectively.  It 
is  difficult  to  classify  xi(t)  and  X2(t)  into  (3)  or  (4)  using 

the  figures  obtained  so  far.  From  Cixi(a)  and  Cix2(a)  we 
obtained  ttIs  =  1.1902,  (2;o  =  0.9999  for  both  xi(t)  and 

^2(t),  00  =  0.6371  for  xi(t)  and  0o  =  0.6367  for  X2(t).  We 
then  subtracted  the  respective  7hix(t)  =  ms  from 

xi(t)  and  X2(t),  squared  the  resulting  quantities,  and  took 
their  normalized  DFT,  the  magnitudes  of  which  are  plotted 
in  Figs,  le  and  If.  A  distinct  extra  peak  occurred  at  a  =  2 
in  Fig.  le,  and  we  therefore  subscribe  xi(t)  to  (4)  and  X2{t) 
to  (3). 

Alternatively,  we  can  bypass  the  estimation  of  uo  and  0o 
by  adopting  the  approach  in  Section  3.  From  Figs.  Ic  and 


Id,  we  estimated  the  peak  strength  at  a  0  to  be  rh^  = 
T1902.  We  then  computed  the  sample  cyclic  mean  square 
M2xick),  the  magnitudes  of  which  are  shown  in  Figs.  Ig 
and  Ih  for  xi{t)  and  X2{t)  respectively.  The  peak  strength 
at  the  nonzero  cycle  yielded  77125  =  1.7798  for  xi(t)  and 
77125  =  1.4234  for  X2(t),  from  which  we  inferred  al  =  0.3632 
for  xi{t)  and  al  =  0.0068  for  X2{t).  Since  the  latter  can  be 
regarded  as  statistically  zero,  X2{t)  is  attributed  to  (3)  and 
a:i(t)  to  (4). 

Example  2:  We  consider  here  the  case  with  ms  —  0.  All 
other  parameters  remain  the  same  as  in  Example  1.  The 
real  parts  of  the  time  series  are  plotted  in  Figs.  2a  and  2b. 
The  saniple  cyclic  mean  does  not  show  a  dominant  peak  for 
3^1  (t)  (Fig,  2c)  but  does  so  for  X2{t)  (Fig.  2d).  The  sample 
cyclic  variance  magnitudes  axe  plotted  in  Figs.  2e  and  2f 
for  xi{t)  and  X2{t)  respectively.  The  extra  peak  at  a  ^  0 
in  Fig.  2e  distinguishes  xi{t)  from  X2{t).  The  sample  cyclic 
mean  square  magnitudes  are  plotted  in  Figs.  2g  and  2h, 
from  which  al  =  0.4211  and  al  =  0.0086  were  estimated  for 
xi{t)  and  X2{t)  respectively.  Since  the  latter  is  statistically 
zero,  we  decide  that  X2{t)  came  from  (3). 

5.  CONCLUSIONS 

Our  focus  here  has  been  on  random  and  constant  amplitude 
harmonics.  Traditionally,  one  examines  the  periodogram 
and  based  on  the  presence  or  absence  of  a  peak,  decides 
whether  the  process  contains  a  constant  amplitude  har¬ 
monic  or  is  purely  stationary.  We  argue  here  that  for  both 
cases,  a  random  amplitude  harmonic  could  be  present,  ei¬ 
ther  with  non-zero  mean  or  with  zero-mean  random  am¬ 
plitude.  By  employing  the  cyclic  variance  or  cyclic  mean 
square,  one  can  easily  tell  the  difference  between  the  fol¬ 
lowing  pairs;  random  (with  ms  #  0)  vs.  constant  ampli¬ 
tude  harmonics,  and  random  amplitude  harmonics  (with 
ms  =  0)  vs.  a  purely  stationary  process.  Simulation  stud¬ 
ies  corroborate  these  findings.  Rigorous  statistical  tests  lie 
ahead  as  further  studies  and  the  results  can  be  easily  ex¬ 
tended  to  multicomponent  processes. 
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Figure  2.  A  zero-mean  random  amplitude  harmonic  and  a  constant  amplitude  harmonic 
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Abstract 

In  this  paper  we  introduce  a  new  time^frequency  based 
method  for  classifying  non-stationary  random  signals.  The 
method  involves  dividing  the  signal  into  overlapping  or  non¬ 
overlapping  segments  considered  to  be  subpopulations  of 
the  entire  population.  From  each  sub-population  we  calcu¬ 
late  a  test  statistic  which  can  be  used  to  construct  a  single 
hypothesis  test.  To  control  the  global  type-I  error  it  is  nec¬ 
essary  to  consider  the  hypotheses  from  all  subpopulations 
simultaneously.  We  use  the  generalised  sequentially  rejec- 
tive  Bonferroni  multiple  hypothesis  test  which  provides  an 
efficient  method  to  simultaneously  test  multiple  hypothe¬ 
ses  while  maintaining  the  global  type-I  error.  Finally,  we 
show  the  results  of  classifying  time-dependent  AR( I)  pro¬ 
cesses  which  have  identical  expected  instantaneous  power 
and  power  spectral  densities  but  different  time-frequency 
representations. 


1.  Introduction 

The  problem  of  signal  classification  can  be  divided  into 
three  consecutive  sub-problems:  detection  of  the  presence 
of  a  signal;  segmentation  to  determine  the  time  interval  of 
the  signal;  and  classification  of  the  signal  into  one  of  a  finite 
number  of  classes.  In  this  work  we  will  focus  on  classifying 
an  observation  signal  into  one  of  two  classes,  i.e.,  we  assume 
that  the  signal  is  present  and  its  time  interval  is  known. 

The  original  contribution  of  this  paper  involves  the  exten¬ 
sion  of  a  frequency  domain  classifier  for  stationary  signals 
[7]  to  a  time-frequency  classifier  for  non-stationary  signals. 
The  motivation  for  this  extension  is  straightforward:  the 
classical  technique  is  only  optimal  (in  the  sense  of  minimis¬ 
ing  the  probability  of  misclassifying  an  observation  of  one 
kind  for  a  fixed  misclassification  rate  of  the  other  kind)  if 
the  signal  is  stationary.  This  leads  us  to  consider  a  technique 
that  does  not  require  the  signal  to  be  stationary.  In  particular, 


we  introduce  a  time- varying  quadratic  discriminant  function 
using  the  spectrogram.  We  apply  the  generalised  sequen¬ 
tially  rejective  Bonferroni  test  to  the  multiple  hypotheses 
that  can  be  constructed  at  different  points  in  time  from  this 
discriminant  function. 

Other  classification  techniques  have  been  suggested  re¬ 
cently  using  time-frequency  distributions  (TFD).  In  [8] 
the  authors  extended  the  log-spectral  distance  to  the  time- 
frequency  case  and  in  [3]  the  authors  proposed  a  technique 
based  on  the  cross  >\figner-\fille  distribution.  In  the  sequel 
we  will  discuss  how  our  method  deviates  from  the  existing 
solutions. 

2.  Time-frequency  discrimination 

In  this  paper  we  developed  the  theory  for  the  simplest 
case  of  classifying  the  signal  into  one  of  two  classes.  It 
is  straightforward  to  extend  the  results  for  a  larger  number 
of  classes.  To  classify  a  signal  into  one  of  two  classes  we 
formulate  the  test 

H:  X  =  Si-hU 
K:  X  =  S2  +  U 

where  Si  and  S2  are  zero  mean  non-stationary  Gaussian 
signals  and  U  is  zero  mean  white  Gaussian  noise.  A 
discrete  time-frequency  distribution  of  a  random  vector 
X  =  [Xi , . . . ,  Xn]\  is  defined  as  [2] 

(7V-l)/2 

Sx{n,k)=  i2jrx(n,m)e-^2-’n*/^  (1) 

m=-(Af-l)/2 

for  n  e  [0,  iV  - 1],  where  Rxx  (fi,  m)  is  the  time-dependent 
covariance  of  the  signal.  In  this  case  we  choose  Rxx  {n,m) 
such  that  the  resulting  TFD  is  the  spectrogram,  however,  in 
general,  this  discriminant  function  is  applicable  for  any  TFD. 
For  the  case  of  classifying  a  signal  into  one  of  two  classes. 
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we  define  the  time-dependent  discriminant: 

d(x,n)  =  ^  Sx{n,k)  (5^^(n,A:)  -  Si  ^{n,k))  (2) 

il!=0 

where:  Sx{n,k)  is  an  estimate  of  the  TFD  from  x  = 
(xi,a:2>"-)a:Ar]'.arealisationofX;5,(n,A:),9  =  1,2, are 
estimates  of  the  TFDs  representing  the  two  different  classes 
and  are  assumed  to  be  non-zero;  and  k  =  0, . . . ,  iV  —  1, 
is  discrete  fi-equency  (assuming  x  is  analytic).  This  time- 
dependent  discriminant  function  can  be  interpreted  as  an  ex¬ 
tension  of  the  power  spectrum  quadratic  discriminant  func¬ 
tion  defined  for  stationary  random  processes  in  [7]. 

In  general,  existing  classification  algorithms  form  a  sin¬ 
gle  test  statistic  from  a  discriminant  function  and  this  is 
used  to  perform  a  single  hypothesis  test  to  determine  if  the 
observation  belongs  to  class  1  or  class  2.  In  our  case  the 
discriminant  function  given  by  Eq  (2)  returns  a  value  at  each 
time.  Each  value  is  used  to  construct  a  hypothesis,  which  are 
then  combined  and  treated  simultaneously.  This  approach 
differs  firom  previous  time-fi-equency  based  methods  [1,3,8] 
where  the  solutions  all  involve  integration  over  time  to  form 
a  single  hypothesis  which  can  lead,  in  practical  situations, 
to  misclassification. 

Smoothing.  If  there  are  zero  terms  in  the  TFDs  of  the  pop¬ 
ulation  they  will  dominate  Eq  (2).  To  reduce  this  problem, 
and  to  lower  the  variance  of  the  estimates,  the  discriminant 
function  can  be  evaluated  using  a  smoothed  TFD 

Sx(n,  jk)  =  ^  W(n  +  m,  fc  -t-  l)Sxin  +  m,k  +  l)  (3) 
where  W{n,k)  is  an  appropriate  window  [2]. 

3.  Multiple  hypotheses 

Multiple  comparison  procedures  provide  a  technique  for 
simultaneously  treating  a  collection  of  separate  tests  derived 
from  sub-populations,  while  maintaining  a  global  level  of 
significance.  If  the  level  of  significance  for  each  individual 
test  is  set  at  a,  then  the  global  level  of  significance  may  be 
much  higher  [4].  In  the  time-firequency  setting  the  signal  is 
divided  up  into  non-overlapping  or  overlapping  segments. 
Each  segment  is  a  sub-population  for  which  a  test  statistic 
can  be  derived  and  a  hypothesis  test  can  be  constructed. 

In  the  following  section  we  discuss  the  generalised  se¬ 
quentially  rejective  Bonferroni  test  which  controls  the  global 
level  of  significance. 

3.1.  Generalised  sequentially  rejective  Bonferroni 
test(GSRBT) 

The  GSRBT  was  successfully  applied  to  a  signal  process¬ 
ing  problem  in  [9].  Eq  (2)  is  defined  for  all  time  samples. 


however  we  are  using  the  spectrogram,  so  we  only  evaluate 
d(x,  n)  at  the  centre  of  the  window.  We  use  the  statistic 
Di  =  d(X,  (2t  -  l)M/2)  where  M  is  the  size  of  the  sub¬ 
population  or  the  spectrogram  window  length  (no  overlap). 
If  the  signal  is  from  class  q  then  M~^Di  is  normal  and 
estimates  of  the  mean  and  variance  are  given  by  [7] 

Jfe=0 

-  SrHii  -  l/2)M,A:))5x((i  -  l/2)M,k)  (4) 
and 

M-l 

-  5r'((i  -  -  l/2)M,k)\S> 

If  the  smoothed  TFD  from  Eq  (3)  is  used  then  Eq’s  (4)  and  (5) 
will  need  to  be  adjusted  according  to  the  chosen  window. 
Each  local  test  can  be  constructed  as  testing  ~ 

W(mii,trij)  against  the  alternative  Jfj :  Di  ~  N{m2i,02i)' 
for  i  =  1, . . . ,  P,  and  P  is  the  number  of  test  statistics. 
However,  as  previously  discussed,  we  need  to  test  all  P 
hypotheses  simultaneously.  To  do  this  we  use  the  GSRBT 
as  follows: 

1.  Calculate  the  p  values,  i.e.,  the  probability  that  Di 
exceeds  its  observed  value  under  Hi.  The  p  values 
are  calculated  asPi  =  l-P((di-7nii)/5-ii),  where 
we  assume  P(j/)  is  the  normal  cumulative  distribu¬ 
tion  function  since  Di  is  asymptotically  normal. 

2.  It  is  possible  to  customise  the  p  values  to  take  into 
account  a  priori  information  pertinent  to  an  appli¬ 
cation.  This  is  achieved  by  using  a  set  of  positive 
real  constants  ci,...,cp,  which  have  values  directly 
proportional  to  the  importance  of  the  individual  hy¬ 
pothesis.  The  constants  can  be  set  to  attain  a  more 
powerful  test.  If  the  constants  are  all  equal  then 
this  procedure  reduces  to  the  sequentially  rejective 
Bonferroni  test  which  the  GSRBT  is  a  generalisation 
of  [5].  The  new  p  values  are  defined  as  Si  =  P./ci. 

3.  Order  the  p  values  in  ascending  order,  5(i)  <  5(2)  < 

. . .  <  5(P)  and  let  C(<)  and  H^i)  be  the  corresponding 
constants  and  hypotheses  respectively.  Also,  let 

Oti  =  Oc/Y^f^i  C(j). 

4.  The  GSRBT,  depicted  in  Figure  1,  is  performed  as 
follows:  If  5(1)  >  ai  then  retain  H^l),...,H^p)  and 
stop;  otherwise,  reject  H(i)  and  test  the  next  hypoth¬ 
esis.  This  procedure  is  repeated  until  either  all  the 
hypotheses  are  rejected  or  a  set  of  hypotheses  is 
retained. 
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Figure  1.  Generalised  sequentially  rejective 
Bonferroni  test 

5.  Finally,  a  global  decision  is  made  based  on  the  set 
%  =  of  retained  hypotheses.  This  decision 
will  depend  on  the  application. 

Now  we  will  summarise  the  result  given  in  [5]  which 
proves  the  GSRBT  maintains  the  the  global  significance 
level  a.  Consider  a  single  hypothesis  where  Pr(7^i  >  Oj) 
under  the  null  hypothesis  is  equal  to  1  -  ctj.  It  is  this 
value  Oj  which  gives  us  confidence  in  our  test.  Similarly 
the  objective  of  a  multiple  test  procedure  is  to  maintain  the 
global  level  of  significance  over  all  the  hypotheses.  Let  I  be 
the  set  of  indices  of  true  null  hypotheses,  then  the  equivalent 
expression  for  forming  a  confidence  interval  for  the  GSRBT 
is  [5] 

Pr(5i>=T^ — ,  Vi€/l>l-a  (6) 

V  J 

This  equation  is  shown  to  be  true  in  [5]  and  therefore  the 
global  level  of  significance  is  maintained. 


4.  Simulations 

In  this  section  we  show  results  for  the  classification  of 
two  classes  of  first  order  time- varying  autoregressive  (TAR) 
signals.  The  classes  are  separable  only  in  the  time-frequency 
space.  The  TAR(l)  process  is  defined  as: 

Xn  =  —a{n)Xn-i  -h  Un  (7) 


where  (/„  is  zero  mean  Gaussian  with  time  dependent  vari¬ 
ance,  (Xu  [n)  ■  The  AR  parameter  a(n)  gives  a  single  pole  ro¬ 
tating  on  the  unit  circle,  i.e.,  a(n)  =  where 


fiin)  = 


{ 


iV/2-l^  +  0-1 

-^n  +  0.7 


0  <  n  <  iV/2  -  1 
N/2<n<N  -1 


(8) 


is  the  position  of  the  pole  on  the  unit  circle  for  the  first  signal 
and 


f2{n)  = 


-7v^n-|-0.4  0<n<N/2-l 

j^n  -  0.3  N/2<n<N-l 


is  the  pole  position  for  the  second  signal  These  signals  were 
chosen  because  they  have  the  same  expected  power  at  each 
time  instant  and  the  same  frequency  content  over  [0,  N  —  1], 
The  SNR  for  the  following  experiments  is  calculated  using 

SNR  =  lOlogio  (Et“o  M/Kr  +  where 
and  are  the  variances  of  the  real  and  imaginary  parts  of 
additive  white  Gaussian  noise. 

To  assess  the  performance  of  our  method  an  Operating 
Characteristic  (OC)  curve  was  constructed  for  each  class. 
We  used  15  realisations  from  each  class  to  estimate  Si  (n,  k) 
and  52(n,  k).  The  constants  c*  were  set  equal  for  these  ex¬ 
periments.  The  OCs  are  shown  in  Figure  2.  The  matched 
filter  and  frequency  spectrum  method  [7]  naturally  do  not 
perform  well  for  this  class  of  signals.  The  template  used  for 
the  matched  filtering  was  an  ensemble  average  calculated 
with  15  realisations  of  signals  from  each  class.  Figure  3 
compares  the  classification  performance  of  the  multiple  hy¬ 
potheses  method  against  a  non-parametric  time-frequency 
method  that  discriminates  between  classes  using  the  dis¬ 
tance  between  the  log  of  the  signal  TFD  and  the  log  of  the 
class  TFDs  [8]. 


5.  Discussion 


There  is  a  number  of  optimisations  which  can  be  in¬ 
cluded  for  a  particular  application.  The  window  length  and 
the  overlap  used  to  estimate  the  TFDs  in  Eq’s  (2)  and  (3)  can 
be  optimised  to  reflect  the  degree  of  non-stationarity  in  the 
classes.  The  GSRBT  can  also  be  customised  to  an  applica¬ 
tion.  As  mentioned  in  Section  3.1  the  weights,  ci, . . . ,  cp, 
can  be  used  to  increase  the  power  of  the  test  and,  in  addition 
the  accepted  hypothesis  can  be  combined  in  any  arbitrary 
way  to  make  a  global  decision.  For  example  if  two  or  more 
hypotheses  are  mutually  exclusive,  this  information  will  in¬ 
fluence  the  global  decision. 

In  Section  3.1  we  assumed  that  the  test  statistics  were 
normal,  for  narrowband  signals  this  is  not  valid.  The  normal 
distribution  is  used  to  calculate  the  p  values  and  therefore,  is 
crucial  to  the  performance  of  the  test.  In  [7]  it  is  shown  that 
for  narrowband  signals  the  discriminant  function  in  Eq  (2)  is 
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a  summation  of  approximately  chi-square  random  variables. 
This  result  can  be  used  to  improve  the  performance  of  the 
algorithm  [6]. 

The  disadvantages  with  this  method  are  twofold:  firstly 
we  assume  local  stationarity  to  estimate  the  TFDs;  and  sec¬ 
ondly,  we  assume  that  the  signals  are  Gaussian.  The  method 
presented  in  [8],  is  non-parametric  and  so  will  be  more  ap¬ 
propriate  if  the  Gaussian  assumption  is  not  valid. 


Operating  Characteristics  for  TimeAR(l)  signals:  SNR  -lOdB 


Figure  2.  OC  for  classification  of  the  two 
signais:  SNR  =  -lOdB.  Comparison  of 
matched  filter,  spectrum,  log  TFD,  and  mul¬ 
tiple  hypotheses  methods. 


6.  Conclusion 

We  have  presented  a  new  method  for  the  classifica¬ 
tion  of  non-stationary  Gaussian  signals  by  combining  time- 
frequency  analysis  with  multiple  hypothesis  testing.  A  time- 
frequency  distribution  is  used  to  separate  classes  of  signals 
that  are  inseparable  in  either  the  time  or  the  frequency  do¬ 
main  alone.  The  use  of  a  multiple  hypotheses  test,  the  gen¬ 
eralised  sequentially  rejective  Bonferroni  test  (GSRBT),  al¬ 
lowed  the  simultaneous  treatment  of  the  set  of  test  statistics 
that  arise  from  the  time-dependent  discriminant  function. 
The  GSRBT  can  be  customised  to  a  particular  application 
to  increase  the  power  of  the  test. 

The  performance  of  this  method  was  evaluated  empiri¬ 
cally  by  classifying  two  classes  of  zero  mean  non-stationary 
Gaussian  signals.  It  performed  favourably  when  compared 
to  the  classical  methods  and  another  non-parametric  time- 
frequency  method.  This  gain  in  performance  is  dependent 
on  the  Gaussian  and  local  stationarity  assumptions. 


Probability  of  correct  dassiflcatlon  vs.  SNR 


SNR  (dB) 

Figure  3.  Probability  of  correct  classifica¬ 
tion  Vs-  SNR.  Comparison  of  log  TFD  and 
multiple  hypotheses  methods. 
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Abstract  A  larger  class  of  the  systems  of  al¬ 
most  localized  wavepackets  is  proposed.  This  class 
contains  the  Cauchy  wavalet  system  and  the  coherent 
state  system  as  special  cases.  The  wavepackets  systems 
in  this  class  are  the  eigenfunction  systems  of  a  kind 
of  linear  operators,  and  they  are  invariant  under  the 
time-shift.  Moreover,  we  show  that  they  are  ^pseudo- 
orihogonaT  over-complete  systems  in  L^.  An  applica¬ 
tion  of  them  to  the  diagonalization  problem  of  the  auto¬ 
correlation  functions  of  stochastic  processes  via  over¬ 
complete  wavepacket  systems  are  also  proposed. 

1  Introduction 

Time  scale  analysis  and  the  time-frequency  analysis  are 
closely  related  to  two  types  of  wavepacket  systems,  the 
system  of  wavelets [1-6]  with  continuous  parameters  and 
the  other  is  the  system  of  coherent  states [2, 7-9],  re¬ 
spectively.  Both  are  over-complete  systems  of  almost 
localized  wavepackets.  It  has  been  known  that  these 
systems  belong  to  ’generalized  coherent  states ’[10].  It 
has  been  shown  that  the  two  systems,  with  special 
wavepackets,  can  be  represented  as  the  eigenfunction 
systems  of  two  respective  linear  operators  [12,11,14]. 
Both  eigenfunction  systems  have  ’time-shift  invariant’ 
property  what  may  be  called,  where  the  shift  of  a  pa¬ 
rameter  causus  only  the  shift  of  the  time  with  the  shape 
of  the  wavepacket  unchanged.  In  this  paper,  we  will 
extend  this  research  into  a  larger  class  of  the  eigen¬ 
function  systems  which  have  these  properties. 

2  Wavepacket  Systems  and 
Quantum  Mechanics 

We  will  begin  by  summarizing  the  mathematical  rela¬ 
tions  between  these  two  kinds  of  wavepacket  systems 
and  the  quantum  mechanics.  For  h(t)  G  L^(R)  whose 
Fourier  transform  -ll(uj)  satisfies 

Gfc  =  fZo  <  oo-  (1) 


define 

A(“.‘)(i)i|a|-U(^^)  .  (2) 

Then  the  set  a  6  R, 6  6  R}  is 

a  ‘pseudo-orthogonal’  (NB:  not  ‘orthogonal’)  over¬ 
complete  wavelet  system  in  L^(R),  where  for  an  ar- 
bitraly  f(t)  €  T^(R),  the  relation 

^  f)  =  m  (3) 

holds[2-5].  In  physical  context,  the  above  wavelet 
system  can  be  regarded  as  the  system  of  the  gener¬ 
alized  coherent  states [10]  associated  with  the  affine 
group[15,16],  and  the  parallelism  between  the  wavelets 
and  the  (usual)  coherent  states  associated  with  Weyl- 
Heisenberg  group  has  been  pointed  out  [2,17].  We  will 
summarize  this  parallelism.  Let  g{t)  be  an  element  of 
L^(R)  such  that 

\gitWdt<co.  (4) 

Then,  with  the  definition  of  the  (usual)  coherent  state 

g(0'P\t)  ^  9{t  -  q),  (5) 

the  set  q  eli,q  eR}  is  an  over-complete 

system  in  L^(R),  and,  for  an  arbitrary  f{t)  G  L^(R)  , 
the  ‘pseudo-orthogonal’  relation 

^  I  /fis  dq  dp  {g<^i,P)^  f)  g{i<p){t)  =  f{t)  (6) 
holds,  which  is  parallel  to  (3).  As  a  special  case,  when 

g{t)  =  go{i)  =  Tr~^e~*~ ,  (7) 

the  above-defined  coherent  states  are  just  correspond¬ 
ing  to  the  wavefunctions  of  the  (usual)  coherent  states 
in  quantum  mechanics  [8,9]  in  the  following  sense;  Let 
Q  and  P  be  the  position-coordinate  operator  and  the 
momentum  operator,  respectively  which  satisfy  the 
commutation  relation  [Q,P]  =  il  (/:  Identity  op.),  and 
define  the  operator 

“  =  (8) 
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Then  the  above  coherent  states  are  the  eigenfunctions 
of  the  operator  a  in  the  following  sense: 


(9) 

a|a}o  =  o\a)a  with  “  =  +  *P)- 

(10) 

(NB:  This  operator  a  is  corresponding  to  the  operator 
P  such  that 

II 

in  the  expression  used  in  signal  processing.) 

As  is  known  well,  this  coherent  state  in 
mechanics  satisfies  the  relation 

quantum 

|a)«a(a|  =  ^ 

(11) 

which  is  mathematically  equivalant  to  a  special  case  of 
the  relation  (7),  and  the  shift  of  the  (complex)  eigen¬ 
value  a  can  be  made  by  the  unitary  transformation 


e7at-7*ala)„  =  -h  y)a  ^^2) 

(where  j  =  "^(q'  +  *p0  )• 

On  the  other  hand,  in  the  wavelet  case,  there  ex¬ 
ists  ‘the  analogue  in  wavelet  version’  of  the  above 
eigenvalue-shift  relation  in  terms  of  the  unitary  rep¬ 
resentation  of  the  affine  group[16].  If,  with  an  appro¬ 
priate  functions  6  and  A,  an  operator  A  satisfies  the 
ralation 

(13) 

(where  B  =  |{Q,  jP}).  then 

(14) 

where  \{A;s,q)  denotes  the  operator  which  is  ob¬ 
tained  by  substituting  the  operator  A  into  the  func¬ 
tion  A(a;s,g)  instead  of  a.  A  kind  of  the  non-trivial 
operators  which  have  this  property  are 

Ak  =  Q-ikP-^  (fc=  1,2,3,...),  (15) 

and  when  A  =  Ak,  A(a; s,g)  =  e“*(a  -|-  q)  [11].  The 
system  of  the  eigenfunction  of  the  operator  Ak  is  cor- 
resopnding  to  the  Cauchy  wavelet  system  with 

h(t)  =  hkit)  ^  (16) 

(where  Gjk  is  a  constant).  In  other  words, 

Q(f|a)A.  (17) 

with  Atla)j4|fc  =  (a  =  b  +  ai).  (18) 

(NB:  This  operator  Ak  is  corresponding  to  the  operator 
Qk  such  that 


{Gkf)(t)  =  tfit)  +  kf!_^f(s)ds 

in  the  expression  used  in  signal  processing.) 

The  Fourier  transform  of  this  eigenfunction  is  iden¬ 
tical  to  the  wavefunction  of  the  affine  coherent  state 
proposed  by  Paul[12](See  also  [17,18]).  Moreover, 
‘the  analogue  in  wavelet  version’  of  the  relation  (3) 
is  obtained  by  ‘tremslating  the  relation  (3)  into  the 
quantum-mechanical  language’  with  a  =  e  *and6  = 
e~‘q  ,  as 

^  /c  (“I  ~  7  (1^) 

The  operator  Ak  satisfies  the  conunutation  relation 

[Ak,  aI]  =  2kP-^  =  -^(A*  -  aI)\  (20) 

which  is  more  complicated  than  [a,al]  =  /  . 

3  A  Larger  Class  of  Eigenfunc¬ 
tion  Systems 

The  wavelet  system  and  the  coherent  states  system  are 
the  eigenfunction  systems  of  the  operators  a  and  Ak- 
These  two  operators  belong  to  the  class 

{c(Q  -f  iy(P))  I  c  :  real,  y(P)  :  func.of  P} .  (21) 

With  c  =  ^  and  y(P)  =  P,  the  operator  a  re¬ 
lated  to  the  coherent  states  is  obtained,  while  the 
operator  Ak  related  to  the  wavelets  is  obtained  with 
c  =  1  and  y(P)  =  -kP~^.  Here  the  constant  c  is  not 
essential  because  the  scalar  product  does  not  change 
the  eigenfunction  system.  So  we  will  investigate  the 
eigenfunction  system  of  the  operators  which  belongs  to 


{<9  +  iy(P)  ;  yiP)  ■  func.of  P},  (22) 

without  loss  of  generality.  Define 

=  Q  -t-  »j/(P),  (23) 

and  denote  the  eigenfunction  (in  the  position  represen- 
tation)  of  this  operator  by 

rp^^Kt)=Q{t\a)^M  (24) 

with  A^*'^|a)^(v)  =  a|a;)^(v).  (25) 


(NB:  This  operator  is  corresponding  to  the  oper¬ 
ator  such  that 

{G^^^f){t)  =  tf{t)  +  i{y{-ii)  f)it) 

in  the  expression  used  in  signal  processing.) 

Then,  formally,  the  eigenfunction  is  the  so¬ 

lution  of  the  equation 
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=  aip^^\t).  (26) 

(Note  that  the  inverse  of  the  differential  operator  is  the 
integral  operator.)  Hence,  it  is  shown  that  the  Fourier 
transform  of 

=  (27) 

(=  p(pI«)a(v)) 

satisfies  the  relation 

(28) 

From  this, 

^{log'9<^^\p))=-yiP)-ia.  (29) 

Let  {pn\'^  =  0, 1, 2, 3...,  M}  be  the  set  of 

zeros  of  (including  ±oo  formally  when 

limp^ioo  =  0).  Then  we  obtain  the  solutions 

'^9'^(p)  =  «'l’!l(p)  (30) 

A  r  (p„  <  p<  p„+i) 

( 0  (otherwise) 

with 

y{p)^  !  y{p)dp  (31) 

b  =  Re  a  =  Im  a,  (32) 

Here,  is  determined  so  that  the  eigenfunction  may 
be  normalized  as 

/;;+‘K’!kp)P  =  i-  (33) 

Because  the  factor  in  (30)  has  no  influence  on 

this  normalization  condition,  we  can  choose  the  con¬ 
stant  Cnfa  which  does  not  depend  on  b  but  only  on 
g{‘)  and  a,  as 

ck^.l  =  (34) 

Note  that  there  is  not  always  a  normalized  solution 

with  support  \pniPn+i]  for  any  a,  because  the  function 

may  not  square-integrable  in  some  interval.  However, 
as  is  easily  shown,  whether  the  normalized  solution  ex¬ 
ists  or  not  depends  only  on  y(*)  and  the  imaginary  part 
a  of  the  eigenvalue  a.  For  example,  in  the  wavelet 
case  with  y(P)  =  the  function 

support  (—00,0]  can  be  normalized  for  a  >  0,  while 
the  function  "^^i^lip)  with  support  [0,oo)  can  be  nor¬ 
malized  for  a  <  0.  In  the  coherent  state  case  with 
y{P)  =  P,  the  function  support  (— oo,  oo) 

can  be  always  normalizad.  It  is  easily  shown  that  the 


function  ^nla(p)  (1  <  n  <  M  —  1)  with  a  compact  sup¬ 
port  IpmPn^i]  can  be  normalized  always  unless  Y{P) 
contains  any  singularity  on  this  support.  Similarly,  it 
is  easily  shown  that  there  are  and  a^_  (which  may 
be  ioo)  such  that  the  function  ^q^^(p)  with  support 
(-00,  pi]  can  be  normalized  for  a  >  and  the  func¬ 
tion  ^^^^(p)  with  support  [pmjOo)  can  be  normalized 
for  a  <  unless  y(P)  contains  any  singularity  on 
these  supports.  When  the  normalization  possible,  then 
from  (30), (33)  and  (34), 

ckl'.l  =  ^  (35) 

with  a  real- valued  function  6{a). 

The  solution  (30)  with  the  normalization  coefficient 
(35)  implies  that  its  inverse  Fourier  tansform  sataifies 

i’ni+aiit)  =  -  b).  (36) 

This  relation  shows  that  the  shift  of  the  real  part  of  the 
eigenvalue  does  not  change  the  shape  of  the  wavapacket 
but  causes  only  the  time  shift.  This  property  is  very 
profitable  for  signal  processing. 

We  can  also  show  easily  that  the  wavepacket 
has  finite  variance  both  in  the  time  domain  and  the 
frequency  domain,  which  implies  the  wavepackets  are 
almost  localized  in  both  domain. 

Another  important  property  of  these  types  of  eigen¬ 
function  systems  is  the  pseudo-orthogonality  men¬ 
tioned  above  when  the  inverse  Fourier  transform  of  the 
function  ea:p(2y(s/2))  or  the  inverse  Laplace  trans¬ 
form  of  the  function  exp(a^s  -f  2y(— 2V/2))  or  that  of 


earp(— a_s  +  2y(— s/2))  exists.  Define  them  if  exist,  as 

“(“)  =  ^  iZo 

(37) 

«+(«)= 

(c  >  a+) 

(38) 

(c  >  — a_). 

(39) 

Using  these  functons  and  the  coefficients  in 
define 

(34),  (35), 

“  2,r|cS|2 

(40) 

u;+(a)  ^ 

==  -  «+)  fZoo  e-^^^P'^+^<'Pdp 

(41) 
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>»"  (“) = 


(42) 


Then  we  can  show  that  the  pseudo-orthogonal  com¬ 
pleteness  relation 


n  r  fZ,  dadb 
Z_-/n=0  J— oo  J—oo 


(43) 


holds  where  Vn(a)  =  Wn(a)  for  1  <  n  <  M,  t)o(“)  — 
u)-(a)  and  VM{a)  =  (The  outline  of  the  proof 

is  given  in  the  latter  paper  of  [14]). 


4  An  Application 

The  eigenfunction  systems  proposed  above  can  be  ap¬ 
plied  to  the  pseudo-diagonalizations  of  the  autocor¬ 
relation  functions  of  stochastic  processes.  In  order 
that  we  regard  a  stochastic  process  as  a  superposi¬ 
tion  of  uncorrelated  random  wavepackets,  we  must 
know  how  to  diagonalize  the  auto- correlation  function 
by  the  over-complete  waveletpacket  system.  In  the 
case  where  the  over- complete  wavepacket  system  is  a 
wavelet  system,  a  systematic  method  for  this  problem 
has  been  proposed [13, 14].  This  method  is  based  on 
non-commutative  operator  algebra,  and  it  utilizes  the 
fact  that  the  wavelet  system  is  the  eigenfunction  sys¬ 
tem  of  the  operator  Afc.  We  can  extend  this  method 
directly  to  the  cases  of  more  general  wavepacket  sys¬ 
tems  proposed  above. 

Let  /(a,  a*)  be  a  function  of  a  complex  variable  ot 
which  is  expanded  by  the  operator  defined  in  (23)  as 


/(a.  «•)  =  Em, nC--, 

(44) 

Then,  define  the  operator  in  ‘normal  order’  and  the 
operator  in  ‘antinormal  order’  (in  extended  version)  by 

(45) 

and 

(46) 

respectively.  and  its  adjoint  do  not  commute 

but  satisfy  the  commutation  relation 

[^(v),^(y)t]  =  2^1.=P  (47) 

When  a  stochastic  process  {x(t)}  with  mean  0  and 
finite  variance  is  given,  then  define  the  auto-correlation 
function  of  {5c(f)}  as 

i?(<l,f2)  =  Wi)^*(<2)]  •  (48) 


Then,  in  a  similar  manner  to  the  method  used  in 
[13,14],  we  can  show  that  the  following  relation  holds. 


Ji(s,t)  =  Enf  f^o(a)dadb 

■7n(a,6)t^S+ai(«)l(’S;m(0  . 

(49) 

7„(a,6)  =  7[,(6-f  ai) 

(50) 

(51) 

/3'„(a)  i  /3„(/m  a.  Re  a)  . 

(52) 

(53) 

Using  these  relation,  the  auto-correlation  function  can 
be  transformed  in  the  pseudo-orthogonal  form  (49)  by 
the  wavepackets  of  the  eigenfunction  system. 


5  Conclusion 

A  larger  class  of  the  time-shift-invariant  eigenfunction 
systems  of  almost  localized  wavepackets  has  been  pro¬ 
posed,  which  contains  the  Cauchy  wavelets  and  the 
coherent  states.  The  ’pseudo-orthogonality  of  these 
function  systems  hais  been  investigated  also.  An  ap¬ 
plication  of  them  to  the  diagonalization  of  the  auto¬ 
correlations  of  random  signals  via  wavepackets  are  also 
proposed. 
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ABSTRACT 

We  show  that  the  blind  LTI  channel  estimation  problem, 
when  the  input  sequence  is  independent,  but  has  time- varying 
statistics,  mimics  that  for  the  i,i.d.  case  under  appropriate 
persistence  of  excitation  conditions.  Hence,  consistent  para¬ 
metric  and  non-parametric  estimators  based  on  a  single  re¬ 
alization  are  readily  obtained.  We  establish  an  ergodicity 
theorem  for  the  time-averages  of  non-stationary  continuous 
time  processes;  we  use  this  to  establish  blind  identifiability 
of  the  LTI  channel  of  a  filtered  inhomogeneous  point  process, 
with  multiplicative  marks.  These  results  extend  to  a  class 
of  time-varying  channels  as  well.  The  theoretical  results  are 
corroborated  by  simulations. 


1  LTI  SYSTEMS  WITH  NON-STATIONARY 
WHITE  INPUTS 

Let  x{i)  be  a  temporally  independent  discrete- time  (DT) 
sequence  whose  statistics  are  time- varying,  i.e.,  its  cumulants 
(assuming  that  they  exist)  can  be  written  in  the  form, 

Ckx{i\  n , ...,  )  :=cum  (x{t),  x(<  -|-  n ),  -  •  • ,  x{i  -h  rjt-i )) 

=  7kx{t)  S{ri)  -  ‘  6{Tk^i)  (1) 

A  simple  example  of  such  a  process  is  the  scaled  process, 
x{t)  =  t;(<)s(<),  where  is  an  iid  sequence,  and  s{<)  is  non- 
random  This  model  is  often  used  to  approximate  seismic 
reflectivity  sequences,  where  the  variance  of  the  process  x(t) 
is  known  to  decay  exponentially  with  time. 

Let  h(t)  be  the  impulse  response  of  a  linear  time-invariant 
(LTI)  system,  and  let 

y{t)  =  '^h{p)x{t-py,  z{i)  =  y(t)  +  w{t)  ,  (2) 


where  w;(<)  is  assumed  to  be  stationary  and  independent  of 
the  signal  y{t).  If  input  x(t)  satisfies  (1),  then  with  tq  =  0, 

7"i ,  )  h{t  +  Ti  —  p)  .  (3) 

p  1=0 

If  x{t)  is  i.i.d.,  7kx{p)  is  independent  of  p  and  (3)  reduces 
to  the  well-known  Bartlett-Brillinger- Rosenblatt  formula  [7]. 
The  7kx{p)  =  7kx  case  has  been  well-studied  [7]. 

Our  objectives  are:  given  only  the  noisy  output  z{t)  in  (2), 
we  may  want  to  estimate  the  channel  A(t),  or  the  input  x(t), 
or  some  statistics  of  the  input.  The  continuous-time  version 
of  this  problem  has  been  studied  for  the  A:  =  2  case  in  [8], 
under  certain  restrictive  assumptions  on  h(i).  If  ykx{p)  is 
periodic  (e.g.,  x(t)  =  u(t)s(t),  with  «(t)  stationary,  and  s{t) 
periodic),  then  Cky  is  also  periodic  in  t,  and  one  can  use 
cyclic  statistics  to  estimate  the  channel. 

Assume  as  in  [3]  that  the  joint  cumulants  of  y(t)  and  u;(<) 
are  absolutely  summable,  and  that  the  appropriate  limits 


exist  (the  assumptions  hold  under  the  sufficient  conditions 
of  bounded  7fci(t)’s,  and  exponentially  bounded  h(tys;  in  the 
case  of  the  scaled  process  x(i)  =  u(i)s(t)  considered  earlier, 
s(<)  should  be  bounded.).  Let 

■^(0)  ~  H^y,  Tkx  :=  lim  ^  Y  7*x(t)  ■ 

T— *00  1 

t  t=l 

Under  our  modeling  assumptions,  both  H{0)  and  are 
well-defined,  and  are  finite  valued.  Assume  for  convenience 
that  H{0)Tix  =  0;  let  :=  (n,  •  •  • ,  Tk^i ),  and 

1  ^ 

<  m  >  n  ’ 

t  t=0 

which  is  the  deterministic  A;-th  order  correlation  of  the  im¬ 
pulse  response  (IR)  h{t).  From  (3),  we  obtain 

<  ^ky{i\Tf^)  >  =  TkxMkhiZk)  • 

From  [3],  we  know  that  the  sample  estimate 

T  *-i 

:=  y  13  TT  y(<  +  n) 

t=l  i=0 

t=i  1,1=0  ; 

=  <  ^kyit;T^)  >  :=  MkyiZf^)  , 

which  is  the  time-averaged  k-ih  order  ensemble  moment  of 
the  random  process  j/(<), 

For  A:  —  2, 3  we  obtain  for  the  linear  model  in  (2) 

T 

M3y{ri,T2)  =  —  y^M3sit;ri,T2) 

t=l 

1  ^ 

~  T  ^ ^  ~  p)*(p)*(p  +  n)h{p  +  Ti) 

t=l  p 

=  M3h{Tl,T2)T3x  (4) 

M2y{r)  =  M2h{T)T2x  (5) 

If  the  input  is  persistently  exciting  in  the  sense  that  T2x  >  0, 
it  follows  from  (5)  that  the  sample  estimate  of  Af2y(r)  yields 
M2h{T),  the  deterministic  correlation  of  IR  h{t),  from  which 
one  can  obtain  a  spectrally  equivalent  estimate  of  A(<).  Sim¬ 
ilarly  if  Fax  ^  0,  the  sample  estimate  of  A/3y(ri,r2)  yields 
Af3/i(n ,  r2),  the  deterministic  bicorrelation  of  IR  h(t),  from 
which  one  can  estimate  k{t)  without  making  any  phase  as¬ 
sumptions;  see  [13]  for  some  caveats.  Several  (non-)  para¬ 
metric  methods  are  available  to  obtain  the  IR  estimates.  If 
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the  additive  noise  iy(t)  in  (2)  is  stationary,  has  zero-mean 
and  zero  bispectrum,  then 

M3z,T{Tip)  - ►  Mzh(ri  p)T3x  ' 

Once  H{z)  has  been  estimated,  we  can  construct  a  zero¬ 
forcing  equalizer  (zfe)  to  estimate  the  channel  input  x{tj. 
In  the  SIMO  case,  where  we  have  measurements,  yi{t)  = 
^  hi{p)x{t  -  p)  we  can  estimate  Hi{z)  = 

Bi{z)/Ai(z)  i  =  1, L.  To  estimate  x{t)y  we  find  FIR  filters 
Gi{z)  such  that  Y^^,Bi{z)G^{z)  =  1.  The  existence  of 
such  Gi's  is  guaranteed  by  the  Bezoutian  theorem  provided 
the  BilzYs  are  co-priirie  [6],  This  overcomes  problems  in 
inverting  the  individual  Hi(zys  (e.g.,  zeros  on  the  unit  circle, 
sharp  band-pass  filters,  etc.).  The  input  estimate  is  provided 
by  Gi{z)Ai{z)yi(t).  Of  course,  zfe’s  are  not  useful  in 

the  noisy  case,  where  one  must  use  Wiener-type  filters.  The 
additive  non-Gaussian  case  can  be  handled  under  poor  SNR 
conditions  by  first  estimating  the  spectrum  of  the  noise  [9, 
10].  We  note  that  some  of  these  ideas  have  recently  been 
used  in  the  context  of  fractional  sampling.  j  t  i 

The  fourth-order  case  is  a  bit  more  complicated.  It  also 
illustrates  that  although  cumulants  appear  to  be  the  natural 
tools  for  dealing  with  stationary  processes  and  LTI  systems, 
the  convenience  is  lost  when  one  deals  with  non-stationary 
processes  (similar  difficulties  are  apparent  in  the  treatment 
of  fourth- order  cyclic  cumulants,  and  with  multiplicative 
models,  which  are  intrinsically  non-linear). 

A  natural  way  to  estimate  the  time-averaged  cumulant  is 
to  combine  the  time-averaged  moments  in  the  usual  manner, 

C4y(n,r2,r3)  =  <  MAy{t;TuT2,Tz)  > 

“[3]  <  M2y(UTl)  >  <  M2y{t;T2  ”  T3)  > 

=  <  C'4y(t;n,r2,r3)  > 

-|-[3]  <  M2y{Uri)M2y{i‘\-T3\T2-T3)> 

-[3]  <  M2yit;  n)>  <  M2y{t\  T2  -  T3)  > 

where  the  [3]  denotes  the  three  terms  obtained  by  permuting 
then’s.  Since  <  A/2y(<;  t'2  —  7*3)  >  =  <  M2y(<+7*3;  7'2  —  “Ts)  >» 
we  obtain 

^%y('>'l  i  “^2  j  "^3)  —  K  04,y{t\Ty^T2,T3)  "> 

-f-[3]  dcOV  (M2y{t\ 7*1),  M2y{t  +  T3\ 7*2  r3))  (6) 

where  dcov  denotes  the  deterministic  covariance, 

dcov  ifitygii))  :=  <  f{i)9(t)  >  -  <  M  ><  9{t)  >  ■ 

Eqn  (6)  expresses  the  estimate  C^y  as  the  sum  of  the  true 
quantity  <  C4y{U  7"i,  7'2,  7^3)  >  and  3  error  terms;  these  error 
terms  are  smaU  if  the  deterministic  covariance  of  the  second 
moment  function  is  small;  the  error  term  can  even  be  zero 
for  specific  n’s.  Eqn  (6)  gives  a  precise  quantification  of  the 
error  and  what  ‘slow  variations’  should  be,  if  the  stationary 
assumption  is  to  be  invoked  in  estimating  fourth-order  cu¬ 
mulants.  From  (3)  we  note  that  this  is  a  function  of  both 
the  filter  h(<),  and  of  the  variations  7fcx(f)- 

For  example,  let  x{t)  =  u[t)s{t)  with  u[t)  iid,  and  s{t)  — 
1  -f  acos(27r/o<).  Let  H{f)  be  band-limited  to  fc  f^  B\thei3 
the  ‘error’  terms  will  be  non-zero  only  if  m/o  €  [fc  i  2i?J, 
m  =  1,2.  The  analysis  is  readily  extended  to  any  periodic 
s(t);  a  simple  conclusion  is  that  if  H{f)  is  low-pass  or  high- 
pass  compared  with  /©,  the  fundamental  frequency,  then  the 
error  terms  vanish. 

Simulations 

The  LTI  system  was  chosen  to  be  an  AR(2)  model  with 
parameters  [i, 0,0.75];  an  i.i.d.  Laplace  sequence,  with 

parameter  A  =  1  was  generated;  the  input  to  the  LTI  sys¬ 
tem  was  the  non-stationary  i.d.  sequence,  a:(n)  =  s(n)u(n), 


where  s(7i)  is  a  deterministic  amplitude  scaling  sequence. 
In  the  first  case,  s(n)  decreased  linearly  from  2  at  n  =  1 
to  1  at  n  =  A,  where  N  =  1024  is  the  number  of  sam¬ 
ples.  In  the  second  case,  s(?2)  =  1  -j-  cos(t/10)  is  p^iodic 
with  a  period  of  20x  samples.  The  three  panels  of  Figure 
1  show  the  estimated  autocorrelation  sequence  for  the  sta¬ 
tionary  case  (3(71)  =  1)  and  for  the  two  non-stationary  cases, 
the  zero-lag  term  was  normalized  to  account  for  scaling  dif¬ 
ferences  {Tkx  ^  1)5  I'hc  panels  show  the  true  values,  the 
mean  estimate  and  the  error  bars  estimated  from  a  set  01 
100  Monte  Carlo  trials.  Figure  2  displays  the  estimates  of 
C'3(r,  0)  for  the  same  set  of  data.  It  is  clear  that  the  esti¬ 
mates  are  unbiased  -  the  curves  corresponding  to  the  true 
value  and  the  mean  estimate  are  virtually  indistinguishable 
in  the  two  figures.  Correlation-  and  cumulant-based  nor¬ 
mal  equations  were  used  to  estimate  the  AR  parameters  for 
the  three  cases.  Table  1  shows  the  mean  and  standard  de¬ 
viations.  In  accordance  with  the  theory,  good  parameter 
estimates  are  obtained  in  all  three  cases. 

2  FILTERED  INHOMOGENEOUS  POISSON 
PROCESSES 

An  interesting  extension  of  the  preceding  ideas  is  in  estima¬ 
tion  problems  connected  with  the  class  of  rriarked  filtered 
inhomogeneous  Poisson  processes  (IPP’s),  which  are  contin¬ 
uous  time  processes.  The  extension  is  natural  since  IFF  s 
are  limiting  case  of  non-stationary  Bernoulli  processes. 

Let  Ti  denote  the  occurrence  times  of  an  IPP  with  intensity 
rate  A(t);  let  u{i)  denote  the  mark  process,  which  is  assumed 
to  be  iid  and  independent  of  the  IPP;  let 


N{t) 

3/(<)  =  ^ .  to<t<T, 

n=  1 


where  is  the  I R  of  a  causal  linear  time- varying  (LTV)  sys¬ 
tem,  and  N{i)  denotes  the  riumber  of  events  over  [to,  if  We 
make  the  standard  assumption  that  the  mark  process  is  iid 
and  independent  of  N{t).  r  /  \  ^ 

The  second  characteristic  functional  of  y(t)  in  (^)  is,  (lol- 
lowing  [12,  Sec  5.7]) 


^(t,)  :=  \n  (t>(v)  :=  In  E  <  exp 


j  y{a-)v{dir))  I 
\{t)Eu  exp|i^  h{a,T-,u)dv((T)^  -  1 


dr 


where  the  expectation  is  with  respect  to  (wrt)  the  marks, 
and  v{a)  is  a  suitably  chosen  function.  For  example,  with 


To,  to  <(T  <  h 

«(<^)  ~  S  Etel  </'  +  >  rp 

I  7  =  1, k;  i;f+i  =  1 


we  obtain, 

=  In  £ /exp 


j'^aiviU 


f 

•/to 


A(7-)Eu  \  exp 


I  ath(ti,  t;  u) 


-  1}  dr 


Differentiating  wrt  the  Oi’s,  and  evaluating  the  derivative  at 
the  origin,  we  obtain, 

cum(2/(ti),-**,7/(tfc)) 


/’ 

Jtn 


X(r)Eu 


dr 


(8) 
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r 


A(r) 


n  “  ’■) 


dr 


(9) 


The  last  equality  follows  from  the  assumption  that 
=  h(t  ~  r„)u„,  i.e.,  the  marks  are  multiplica¬ 
tive  and  the  system  is  time-invariant;  this  model  has  been 
used  to  study  low  frequency  noise  [12,  p  217]. 

Notice  the  similarity  between  (3)  and  (9).  Here, 


=  A(<)£{«*}. 

The  unmarked  process  is  obtained  when  u{ii)  =  1.  It  is 
interesting  to  note  that  the  cumulants  of  all  orders  of  the 
unmarked  IPP  are  all  non-zero,  and  are  all  identical  to  one 
another;  this  also  demonstrates  that  the  Poisson  process  is 
strongly  non- Gaussian,  and  does  not  easily  satisfy  Gaussian 
central  limit  theorems  [2].  On  the  other  hand,  this  also  fa¬ 
cilitates  performance  analysis. 

Let  the  time- averaged  intensity  be  denoted  by  {to  =  0) 


T-*oo  1  T-*oo  1 


For  the  IPP,  the  earlier  assumption  i^(0)rix(0)  =  0  trans¬ 
lates  to  £"«  =  0  or  (0)  =  0  in  practice  (the  case  of  a  finite 
support  A(f),  i.e.,  A  =  0,  will  be  discussed  elsewhere).  The 
condition  -^(0)  =  0  is  readily  guaranteed,  for  example,  by 
cascading  a  DC  notch  filter. 

In  the  previous  section  we  used  the  ergodicity  results  of 
[3]  which  were  derived  for  discrete- time  processes.  Here,  we 
state  the  continuous-time  counterparts  (proofs  being  omitted 
due  to  lack  of  space). 

Theorem  1.  Let  the  processes  {2/t(<)}i^i  have  absolutely  in- 
tegrable  joint  cumulants, 

J  sup  |rjcum(3/, (,(<),-  •  •  +  rA-_i))|(ix  <  oo, 

where  dr  :=  dri  -  *  •  drK-i ,  j  =  1, ...,  A"  —  1,  and  the  f^’s  take 
on  possibly  repeated  values  in  [l,...,Af].  Let  tq  :=  0,  and 
assume  that  the  limit 


rT  r 

;=  —  J  dt  £  j  JJ  3/i(<  +  n) 


exists  and  is  finite.  Let 
Mh 


^  rT  A'-l 

x.TiZj^-)  :=  Tf;  Il3/i(<  +  n 


)  di 


denote  the  sample  estimate.  Then, 


=  0(T-n  . 

Further,  the  estimates  are  asymptotically  normal.  O 

As  in  the  case  of  DT  processes,  the  /:  =  2, 3  cases  are 
illustrative  (recall  the  assumption  Eu  =  0  or  H{0)  =  0  — ► 
Ey(t)  =  0): 


M2y{T) 

=  lim  i  f 

=  AE{u^} 


<  M2y{t;T)  > 

T 

di  C2y{t]  r) 


f  J  h{cr)h{(r r)  da  (10) 

M3y{T,p)  — ►  <  M3y{t\r,p)  > 

=  J  h{a)h{a r)h{a p)  da  (11) 


From  the  estimated  M2y  and/or  one  can  estimate  h{t) 
(parametric  and/or  non-parametric);  in  practice,  one  may 
sample  the  filter  output  y{t),  and  use  DT  algorithms.  As  in 
the  DT  case,  the  fourth-order  case  is  complicated. 

Thus,  using  the  ergodicity  theorems  for  CT  processes, 
we  have  established  that  the  channel  h{i)  can  be  estimated 
blindly,  i.e.,  without  knowledge  of  A(<),  provided  the  per¬ 
sistence  of  excitation  condition,  A  >  0,  holds.  As  in  the 
DT  case,  additive  noise  whose  joint  pdf’s  are  symmetrically 
distributed  (e.g.,  Gaussian,  Laplace)  can  be  handled  by  us¬ 
ing  M3y.  The  non-Gaussian  noise  case  can  be  handled  as 
mentioned  in  an  earlier  section. 

Once  the  channel  h(<)  has  been  estimated,  we  can  use  EM- 
type  approaches  [1]  or  constrained  estimators  [11]  to  over¬ 
come  problems  with  the  MLE.  Alternatively,  we  can  equalize 
the  channel  and  estimate  z(t);  as  noted  earlier,  this  is  easier 
in  the  SIMO  case.  Since  the  Poisson  process  is  CT,  very  fine 
sampling  is  required. 

In  the  noisy  case,  we  must  detect  the  points;  here  again, 
the  approach  of  [9,  10]  allows  us  to  recover  the  spectrum  of 
the  noise.  We  can  now  use  either  LS  or  ML  [12]  to  estimate 
the  parameterized  intensity  function. 

The  non-parametric  X{i)  problem  is  generally  ill-posed, 
since  consistent  estimates  cannot  be  obtained  from  a  single 
realization.  If  A(t)  is  periodic  (e.g.,  auditory  physiology,  op¬ 
tical  range-rate  finding,  shot  noise  in  phase  tracking  loops, 
radar  clutter  in  scatter  communications,  etc),  then,  one  can 
use  cyclic  estimates.  The  period  can  be  estimated  efficiently 
via  the  FT  -  a  simple  generalization  of  the  result  in  [14]. 
Using  the  results  of  Theorem  1  it  is  easy  to  show  that  the 
post-stimulus  time  (PST)  histogram,  see,  e.g.,  [5],  is  a  con¬ 
sistent  estimate  of  A(<):  the  mean  is  A(i)  and  the  variance  is 
A(/)/A'i  where  K  is  the  number  of  periods  used  to  construct 
the  estimate. 


Simulations 

We  can  generate  an  HPP  easily  since  the  inter- arrival 
times  are  independent,  stationary  and  exponentially  dis¬ 
tributed.  We  can  convert  this  to  an  IPP  by  non-linear 
warping  of  the  time-axis.  We  can  also  generate  inter- arrival 
times  of  an  IPP  by  generating  r.v.’s  with  the  pdf  p{u)  = 
A(u)/A(T).  The  last  two  approaches  involve  computation  of 
the  inverse  function,  A^^(v).  An  alternative  is  the  Lewis- 
Shedler  thinning  algorithm  [4]  which  first  generates  points 
of  an  ^easy’  IPP  (i.e.,  an  IPr  whose  A(ti)  function  is  easily 
inverted),  with  intensity  function  Ae(f)  >  A(f),  Vf  €  [fo,T]; 
points  are  then  deleted  by  generating  an  auxiliary  set  of 
uniform  r.v.’s.  We  used  this  approach  and  generated  the 
HPP  points  with  constant  intensity  A  >  maxt  A(f).  Simula¬ 
tion  parameters  were:  analog  filter  h{i)  =  exp(~-3f)  cos(5/), 
output  sampling  interval  of  0.01s,  500  Laplace-distributed 
marks,  1024  output  samples,  and  100  trials.  An  AR(^2) 
model,  which  is  appropriate  for  this  h{t),  was  fitted  to  tne 
output  time  series;  the  mean  and  standard  deviations,  esti¬ 
mated  over  a  Monte  Carlo  run  of  100  trials,  are  shown  in 
Table  2  for  the  three  input  processes:  a  stationary  process, 
an  HPP  and  an  IPP.  The  TV  intensity  function  for  the  IPP 
was  X{t)  =  1  -h  A(t;  0.2, 0.3)  +  A{t;  0.2,  0.7),  where  A(f;  r,  to) 
denotes  a  triangular  pulse  of  duration  r  centered  at  <  =  to^ 
The  stationary  process  consisted  of  500  equi-spaced  samples 
with  Gaussian  amplitudes.  Good  parameter  estimates  were 
obtained  in  all  three  cases,  as  promised  by  the  theoretical 
development. 


3  DISCUSSION 

We  used  the  theory  of  mixed-time  averages  in  [3]  to  es¬ 
tablish  the  identifiability  of  DT  linear  systems  driven  by 
non-stationary  white  processes.  V/e  extended  the  theory  to 
continuous  time  processes,  and  used  that  to  establish  blind 
identifiability  of  CT  linear  systems  driven  by  a  marked  in¬ 
homogeneous  Poisson  process.  We  stress  that  these  results 
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were  established  under  weak  persistence  of  excitation  condi¬ 
tions.  As  in  the  case  of  cyclic  statistics,  deterministic  signals 
in  noise,  and  multiplicative  noise  models  [13],  we  note  that 
moment  statistics  are  easier  to  manipulate  than  cumulpt 
statistics.  The  theoretical  results  were  corroborated  by  sim- 

These  results  can  be  extended  to  a  class  of  self-exciting 
point  processes,  called  non-stationary  renewal  processes 
(nsrp),  where  the  intensity  function  is  of  the  form 

fi(i;  wi , ....  u’NCt))  =  A(t)r(/  —  if  jv(t))  » 


where  r(/)  is  the  recovery  function  which  is  assumed  ^  be 
monotonically  non-decreasing  with  values  in  [0,1].  Here, 
Wi's  are  the  occurrence  times  and  N{t)  is  the  number  of 
events  in  so  that  ifjv(0  is  the  time  of  occurrence  of 

the  last  event  prior  to  i.  For  this  model,  we  are  interested 
in  estimating  both  X{t)  and  r(t)  given  the  filtered  output 
y{i).  In  the  case  of  marked  processes  the  analysis  is  partic¬ 
ularly  easy  if  the  marks  are  multiplicative  and  i.i.d.  WP^^al 
assumptions);  the  non-zero  mean  case  is  easier  to  handle. 
If  the  intensity  function  is  periodic,  one  can  exploit  cyclic 
statistics  qjs  well.  Some  of  these  issues  and  the  extension  of 
Theorem  1  to  nsrp's  will  be  presented  in  the  full  paper. 

IPP’s  mav  be  useful  in  modeling  impulsive  correlated  noise 
(bursts).  Here,  a  non-conventional  renewal  function  (de¬ 
creasing  rather  than  increasing)  is  required  to  m^el  the 
fluctuations;  e.g.,  y{i\Nt)  =  ^(0[l  d"  /(^  ”  -^O]; 
of  the  marks  dictate  the  amplitude  distribution  of  the  noise. 
As  in  the  case  of  DT-LTV  models,  one  could  expand  A(0  on 
a  set  of  known  basis  functions  (e.g.,  sines  and  cosines),  and 


then  estimate  the  coefficients. 

The  extension  to  the  TV  channel  is  straightforward,  pro¬ 
vided  the  time- variations  can  be  expressed  in  terms  of  known 
basis  functions,  i.e.,  the  unknown  projection  coefficients 
themselves  are  not  time-varying.  It  is  interesting  to  note 
that  the  theory  of  time-averaged  moments  is  also  useful  in 
the  input-output  analysis  of  non-linear  systems. 
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Table  1.  AR(2)  system  identification  with  station¬ 
ary  and  nonstationary  two-sided  exponential  driving 
sequence.  Second  and  fourth-order  statistics  results 
are  compared. 


Table  2.  AR  Estimates  for  filtered  IPP 


a( 

1)  ^ 

a(2)  _ 

Sty 

HPP 

IPP 

-1.8786 
-1.8719  1 
-1.8599  ( 

^0.1035) 

:0.1068) 

[0.1447] 

0.8863  (0.0967) 
0.8801  (0.0996) 
0.8690  (0.1348) 

C2  -  stationary  case 


Figure  1.  C2  Estimates 


C3  ^  Stationary  case 


Figure  2.  C3  Estimates 
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Abstract 

A  statistical  analysis  of  the  polynomial 
phase  signal  parameter  estimates  achieved 
when  using  the  structured  auto-regressive  ap¬ 
proach  is  presented.  The  estimates  are  con¬ 
sistent  for  high  SNR  or  large  number  of  sam¬ 
ples,  N.  An  expression  for  the  covariance  of  the 
estimates  is  given.  Numerical  examples  con¬ 
firm  that  the  theoretical  covariance  apply  well 
to  empirical  data  for  a  wide  range  of  SNR  and 
N.  The  performance  of  the  estimator  depends 
on  the  filter  length,  n,  and  the  sampling  strat¬ 
egy  which  may  be  non-uniform.  The  optimal 
choice  of  n  for  evenly  sampled  cisoids  is  given 
as  a  function  of  N.  The  variance  is  inversely 
proportional  to  SNR^  for  small  SNR,  and  to 
SNR  for  medium  and  high  SNR. 


1  Introduction 

In  a  variety  of  applications,  such  as  radar, 
sonar,  geophysics  and  communication  it  is  of 
great  interest  to  estimate  the  parameters  of 
a  non-linear  phase  function  of  complex  valued 
signals.  Such  signals  can  be  modeled  as 

m 

®(^)  =  O' «'(<)}  (1) 

1=1 

where  ai{t)  and  bi{t)  are  real  valued  continuous 
time  functions  that  model  the  phase  and  am¬ 
plitude,  respectively.  The  signal  s(/)  is  sam¬ 
pled  at  time  instants  and  the  mea¬ 

surements  of  the  signal  are  corrupted  by  noise, 
which  in  many  scenarios  can  be  modeled  as  ad¬ 
ditive,  white  and  Gaussian 

y{tk)  =  s{ik) e(ik)  (2) 

•The  work  was  supported  by  Ericsson  Microwave 
Systems  AB  and  Ericsson  Infocom  Consultants  AB 


where  E[e(4)]  =  0,  E[e(^jb)e(<d]  =  0  and 
=  cr^Sk,i  ^  k, I,  A  special  case  of 
non-linear  phase  is  the  polynomial  phase  func¬ 
tion 

(^/c)  =  ao  -f  aiift  +  a2tl  +  . . .  +  Oqtl  (3) 

The  polynomial  phase  function  has  been 
proven  useful  in  important  applications.  In 
Doppler  radar,  for  example,  radar  returns  from 
maneuvering  targets  give  rise  to  a  non-linear 
phase  that  can  be  modeled  by  a  polynomial. 
Estimates  of  the  polynomial  coefficients  can 
then  be  used  to  determine  the  target’s  kine¬ 
matic  parameters  (velocity,  acceleration  etc). 
Considerable  attention  has  been  payed  to  the 
estimation  of  the  parameters  of  non-linear 
phase  signals,  see  for  example  [1,  3,  4,  5,  6] 
and  the  references  therein. 

In  [1]  an  approach  based  on  a  struc¬ 
tured  auto-regressive  model  was  proposed  and 
proven  successful  using  simulations.  The  pro¬ 
posed  method  estimates  the  phase  and  ampli¬ 
tude  parameters  of  a  quite  general  class  of  sig¬ 
nals,  including  polynomial  phase  signals,  and 
has  some  interesting  properties.  For  exam¬ 
ple,  the  structured  AR  model  is  a  model  based 
time-frequency  representation  (TFR),  and  the 
data  is  not  constrained  to  be  evenly  sampled  in 
time,  as  is  the  case  in  [3,  4,  5,  6].  On  the  con¬ 
trary,  it  was  empirically  shown  that  the  vari¬ 
ance  of  the  estimates  can  be  significantly  re¬ 
duced  by  using  a  time- varying  sampling  period. 
In  [1]  a  numerical  study  of  the  performance  of 
the  structured  AR  approach  showed  that  the 
bias  is  negligible  and  the  relative  efficiency  (the 
variance  of  the  estimates  derived  by  Cramer- 
Rao  lower  bound)  typically  attains  a  value  of 
1. 5-2.0.  No  theoretical  analysis  was,  however, 
presented  and  a  number  of  important  ques¬ 
tions  were  left  unanswered.  Here  a  theoretical 
analysis  is  presented  for  the  case  of  a  mono¬ 
component  polynomial  phase  signal  with  con- 
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stant  amplitude: 

5(4)  =  6oeJ  (4) 

where  60  >8  a  positive  constant  and  a{tk)  was 
defined  in  (3). 

2  Structured  AR  modeling 

Consider  a  linear  projection  of  {y(^fc“p}p=i 
to  y(tk) 


Consider  a  “structured”  analytical  AR  filter, 
denoted  by  ,  that  is  a  function  of  a 

model  signal,  parameterized  by  .  In  order  to 
derive  the  structured  AR  filter,  the  following 
notation  is  convenient: 

i=l 

6i(to;«?)Hl  (7) 


n 

p=l 

where  (ci, . . . ,  c„)^  =  6  denote  the  AR  param- 
eters.  It  is  well  known  how  to  calculate  the  AR 
parameters  that  minimize  the  variance  of  the 
prediction  errors  when  the  signal  is  WSS  and 
correlation  ergodic.  Then,  the  AR  param^eters 
carry  information  on  the  signal  {5(ffc)}fc=o 
can  be  used  as  a  tool  to  get  estimates  of  the 
signal  parameters.  The  information  carried  in 
the  AR  parameters  resembles  an  average  over 
time,  and  is  obviously  of  little  value  if  the  sig¬ 
nal  is  time-varying.  If,  however,  an  ensemble 
of  M  realizations  was  available;  then  the  “in¬ 
stantaneous”  9  at  time  instant  say 
could  be  calculated  from  the  ensemble,  and 
9{tk)  would  carry  information  on  the  signal  at 
time  ik^  Properties  such  as  the  instantaneous 
frequency  and  spectral  density  could  then  eas¬ 
ily  be  obtained.  The  analytical  AR  filter  of 
order  n  that  minimizes  the  expected  prediction 
error  variance  at  time  instant  tk  can  be  derived 
as  a  function  of  the  signal  parameters.  Let 
denote  the  true  signal  parameters.  Consider 
the  hypothetical  case  when  an  infinite  number 
of  realizations  are  available.  The  AR  param¬ 
eters  that  minimize  the  prediction  error  vari¬ 
ance  at  time  ik ,  denoted  by  £{tk ;  i?)  >  satisfy  the 
well  known  projection  theorem,  i.e.  the  projec¬ 
tion  error  shall  be  orthogonal  to  the  data  used, 

where  E[.]  denotes  expectation  taken  over  the 
ensemble.  The  solution  is 

which  defines  the  mapping  from  to  the  AR 
parameters.  In  (5)  i?)  and  r{tk\  ^9)  denote 

the  “structured”  covariance  matrix  and  vector, 
respectively,  whose  elements  consist  of  the  an¬ 
alytical  covariance  function 

f{tk]^iU,v)  =  E[y(/jt-«)y*(ffc-v)]  (^) 


The  reason  for  the  normalization  of  to 

1  is  explained  below. 

It  is  straightforward  to  show  that  f{ik]9)  and 
can  be  written  as 


where  I  denotes  an  (n  |  n)  identity  matrix  and 
denotes  complex  conjugate.  Formula  (5) 
involves  an  inversion  of  the  (n  |  n)  matrix 
which  implies  heavy  computations. 
This  can  be  avoided  by  a  straightforward  use  of 
the  matrix  inversion  lemma;  {A  -f  BCD)  = 
>1-1  _  A-^B{DA^^B  +  Us¬ 

ing  A  =  B  =  x*(4-i;i9),  C  =1  and 
D  =  and  assuming  that  7*^  0 

gives  the  final  formula  for  the  structured  AR- 
filter; 


9(tk;i^)  = 

m;^)  = 

SNRo  = 


SNRo _ 

“  1  +  SNRo  x"(tk-i;^Mik-i;^) 

/r2 


At  this  point  the  normalization  in  (7)  be¬ 
comes  clear.  Consider  I3{tk\^)  above.  Sup>- 
pose  x{tk\^]  was  not  normalized  and  write 
x(4;i9)  =  6  where  6  is  some  con¬ 

stant,  If  the  normalization  was  not  inj-roduced 
then  the  mapping  from  the  duplet  to 

would  not  be  1:1,  which  would  cause 
the  Hessian  used  in  the  non-linear  search  for 
the  signal  parameters  to  become  singular. 
Note  that  r(tk\^,u,v)  and  hence  can 

be  calculated  for  non-uniformly  sampled  data. 
For  the  case  of  a  mono-component  signal  with 
time  invariant  amplitude  P(ik\  1^)  simplifies  to 


/?(4;t9)  =  /?o  = 


SNR 

1  -h  nSNR 


(8) 


where 
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The  structured  AR  filter  0(ik]'d)  is  used  to  pre¬ 
dict  y(tk )  and  the  prediction  is  given  by 

Wk\^)  =  y{tk-i) 

yitk-i)  =  (j/(4-i), . . . ,  yitk-n)f 
which  implies 

=  2/(4) 

=  y(tk)  +  0'^{tk;^)y{tk-i) 

The  signal  parameter  estimates  are  found  by 
applying  0{tk\‘d)  to  a  single  realization  and 
minimizing  the  prediction  error  variance  with 
respect  to  : 

d  =  argrnin  (9) 

VW  =  iEk=n\e{tk-J)\^  (10) 

The  minimization  of  (10)  is  easily  implemented 
using  e.g.  a  Gauss-Newton  search.  The  deriva¬ 
tives  of  9{tk ;  with  respect  to  are  straight¬ 
forward  to  compute.  The  search  can  be  imple¬ 
mented  off-  or  on-line  (tracking). 

3  Statistical  performance 

In  the  following  it  is  assumed  that  the  signal 
is  described  by  (3)- (4).  All  results  are  given 
without  proof  due  to  lack  of  space.  The  proofs 
can  be  found  in  [2]. 

Result  1  The  structured  AR  signal  parame¬ 
ter  estimates  are  consistent  for  large  number  of 
samples  or  large  SNR,  i.e.  d  do  as  oo 
or  SNR-)^  oo. 

For  asymptotic  SNR,  the  result  holds  under 
the  assumption  that  the  data  set  consists  of  at 
least  q  samples  with  non  zero  sampling  inter¬ 
val.  For  asymptotic  N,  the  result  holds  un¬ 
der  the  assumption  that  increases  without 
bound  as  as  oo. 

Result  2  Assume  the  noise  is  white  and 
Gaussian  and  that  d  is  close  to  do  .  Then  the 
signal  parameter  estimates  are  Gaussian  with 
covariance  matrix 

cov{d)  =  jE:[K"(i?)]“1  cov{V'{d))  £:[K"(i?)]“i 
where 

cov(V'(t5))  = 

N—\  min(N--l,fc*f n) 

...  x:  E 

fc=n  j=max(n,Jc— n) 


A*,i 


W(«?o)] 


Va* 


(1-/3o)(SNR11^  +  I)Jm-... 

ISNR  +  1)  11^-... 

00^  \l-k\  1,_*  -  . . . 

Po[\k-lGj-k  +  Bk-llJLk)  —  •  •  • 

ih^ik-iiT-k 

N 

VaJ(SNRll^  +  I)Vafc 

fc=n+l 

/  tk-tk-u  \ 

\  ^k-tk-.n,  / 


and  Po  was  defined  in  (8),  I  denotes  the  iden¬ 
tity  matrix  of  order  n,  denotes  an  (n  |  n) 
matrix  with  zeros  everywhere  except  on  the 
(/  —  k):th  diagonal  which  is  filled  with  ones,  1 
denotes  an  (n  |  1)  vector  filled  with  ones,  l/_fc 
denotes  an  (n  |  1)  vector  with  ones  on  all  en¬ 
tries  except  on  entries  1, . . . ,  (l—k)  if{l—k)  >  0 
or  n-f  (/  —  /:), ...,  n  i/ (/  —  /:)<  0  which  are  filled 
with  zeros,  and  ei^k  denotes  an  (n  |  1)  vector 
with  zeros  on  all  entries  except  the  (/  —  k):th 
which  equals  L 


Result  2  holds  provided  d  is  close  to  do  which, 
according  to  Result  1,  will  happen  either  for 
large  enough  SNR  or  N ,  The  statistical  per¬ 
formance  of  the  signal  parameter  estimates  de¬ 
pend  on  the  structured  AR  filter  length.  Re¬ 
sult  2  can  be  used  to  derive  the  optimal  filter 
length,  n,  that  minimizes  the  variance  of  the 
signal  parameter  estimates.  The  following  re¬ 
sult  holds  for  the  case  of  a  uniformly  sampled 
linear  phase  signal: 


Result  3  For  linear  phase  signals  that  are 
uniformly  sampled  and  of  medium  to  high  SNR, 
the  optimal  choice  of  n  is  y. 


4  Numerical  examples 


In  the  figures  below,  dashed  curves  corre¬ 
spond  to  empirical  results,  solid  curves  to  the¬ 
oretical  (Result  2)  and  dash-dotted  curves  cor¬ 
respond  to  the  Cramer- Rao  lower  bound.  The 
empirical  variance  was  calculated  using  Monte 
Carlo  simulations  based  on  50  runs  for  each 
set  of  variable  values.  The  signal  was  cho¬ 
sen  to  be  a  quadratic  FM  signal;  a(1k)  = 
7r-|-307rt/;— 807rt^-{-707r/^.  The  figures  illustrate 
the  performance  of  0,2.  The  corresponding  fig¬ 
ures  for  ai  and  03  show  the  same  behavior  and 
are  therefore  excessive. 

Figures  1-2  illustrate  how  the  variance  of  the 
structured  AR  estimates  depend  on  the  filter 
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length  n  (SNR  =5dB),  and  on  SNR  (n  =  15), 
respectively;  N  =  100,  data  uniformly  sampled 
with  ifc—tk-i  =  0  01;  “  (0.01, 

As  seen  from  Fig  1  the  variance  of  the  phase 
parameters  rapidly  decreases  with  increasing 
n  until  an  optimal  value  is  reached.  The  the¬ 
oretical  variance  closely  follows  the  empirical 
and,  most  importantly,  successfully  predicts 
the  optimal  choice  of  n,  which  for  this  case 
is  Tiopt  =  14.  This  is  of  great  practical  im¬ 
portance  since  it  implies  that  the  theoretical 
variance  expression  can  be  used  for  optimal  fil¬ 
ter  design.  Result  3  only  applies  to  linear  phase 
signals  {q  =  1)  and  can  therefore  not  be  applied 
to  this  example  of  a  quadratic  FM  (g  =  3).  Ex¬ 
pressions  for  the  optimal  choice  of  n  for  g  >  1 
is  under  current  investigation.  An  empirical 
investigation  indicates,  however,  that  riopt  is 
close  to  inversely  proportional  to  q. 

Figure  2  illustrates  how  the  variance  depends 


a2 


Figure  1.  Variance  vs  n 


on  SNR.  The  theoretical  variance  is  inversely 
proportional  to  SNR^  for  low  SNR,  and  in¬ 
versely  proportional  to  SNR  for  medium  and 
high  values  of  SNR.  The  theoretical  variance 
coincides  well  with  the  empirical  down  to  a 
threshold,  say  SNRj,  below  which  it  no  longer 
applies.  For  SNR:s  below  SNRt  the  series  ex¬ 
pansion  used  to  derive  the  theoretical  variance 
is  not  valid. 

The  theoretical  covariance  expression  also  ap¬ 
plies  very  well  to  small  data  sets  (N  10  —  20, 
SNR=5dB).  Illustrations  can  be  found  in  [2] 
but  are  not  presented  here  due  to  lack  of  space. 


Figure  2.  Variance  vs  SNR 


5  Concluding  remarks 

The  theoretical  covariance  has  been  com¬ 
pared  to  empirical  results  for  a  wide  scenario 
of  N,  SNR,  filter  lengths,  and  (non-uniform) 
sampling  strategies.  It  has  been  verified  that 
the  theoretical  expression  accurately  predicts 
the  empirical  variance  for  SNR  and  N  down 
to  a  threshold.  The  threshold  is,  however,  low 
and  Result  2  can  be  applied  to  most  scenarios 
of  practical  interest. 
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Abstract 

This  paper  compares  two  algorithms  for  estimating  the 
instantaneous  frequency  of  complex  signals:  the  high- 
order  ambiguity  function  (HAF)  and  the  polynomial 
Wigner-Ville  distribution  (PWVD).  Comparison  is  made 
by  asymptotic  first-order  error  analysis,  which  is  verified 
by  simulation.  It  is  shown  that  when  the  signal  phase  is  a 
polynomial  function  of  time,  the  HAF  always 
outperforms  the  PWVD.  Two  other  advantages  of  the 
HAF  over  the  PWVD  are:  (a)  it  has  lower  computational 
complexity:  (b)  unlike  the  PWVD  its  use  is  not  limited  to 
the  instantaneous  frequency  at  the  middle  of  the 
observation  interval 


1.  Introduction 

Let  Mj)  be  the  complex  signal 

i(t)  =  fl(t)exp(y(!)(t)|,  0</<7’.  (1) 

The  instantaneous  frequency  (IF)  of  this  signal  is  defined 
as 

=  0<t<T.  (2) 

Estimation  of  the  IF  over  the  interval  [0,T]  from  noise- 
corrupted  measurements  of  s{t)  is  important  in  areas 

such  as  radar,  communication  and  sonar. 

This  paper  examines  two  methods  for  estimating  the 
IF.  The  methods  are: 

1.  The  high-order  ambiguity  function  (HAF).  This 
method  models  the  phase  fimction  of  the  signal  as  an 
Mh-order  polynomial 

5(/)  =  Cexp{;<zi(/)),  ^t)  =  '£aj"' .  (3) 

in=:0 


Tlic  coefficients  of  the  polynomial  are  estimated 
successively,  starting  at  the  highest  order.  The  IF  of  the 
signal  is  obtained  by  differentiating  the  estimated  phase 
polynomial.  The  definition  of  the  HAF  and  the  details  of 
the  algorithm  are  given  in  [I]. 

2.  The  Polynomial  Wigner-Ville  Distribution  (PWVD), 
introduced  by  Boashash  in  [2]  is  defined  as 

PWVD{  t,a)  =  ( t,  r)  e-^'^dr  ,  (4) 

where 

r“>(<,r)  =  n[++«,^)|  .  (5) 

1=0 

and  T  =  T  -t] !  max(c,  } .  The  IF  is  estimated  as 
Q)[t)  =  arg  max|PIFFZ)(i,ffl)|  (6) 

The  parameters  are  chosen  so  as  to  make 

ci)[t)  unbiased. 

This  paper  compares  the  two  methods  from  accuracy 
and  complexity  points  of  view.  Accuracy  is  evaluated  by 
means  of  analytic  derivation  of  the  asymptotic  errors, 
which  are  then  verified  by  simulations.  The  analytic 
derivation  of  the  asymptotic  errors  of  the  HAF  has  been 
done  before  [3],  but  that  of  the  PWVD  is  new.  The 
analysis  is  limited  to  the  case  of  constant  amplitude 
polynomial  phase  signals. 

The  paper  gives  the  details  of  the  analysis  of  the  two 
algorithms,  followed  by  a  selected  set  of  simulation 
results. 

2,  Derivation  of  the  error  formulas 

In  this  section,  we  present  the  analytic  derivation  of 
the  asymptotic  error  formula  for  the  PWVD.  The  details 
of  I  he  error  formula  for  the  HAF,  can  be  found  in  [3]. 
The  rcsiills  obtained  are  asymptotic  in  respect  to  the  data 
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point  number  tending  to  infinity.  The  derivations  do  not 
impose  any  restriction  on  the  SNR. 

2. 1.  Preliminaries 

The  signal  model  is 

){n)  =  z{n)  +  'w{n),  (7) 

where  is  a  unit-modulus  polynomial  phase  discrete- 
time  signal,  defined  by 

M 

z(n)=exp|;^n)j,  ^n)  =  ^a„«'”,  (8) 

m=0 

and  w(m)  is  a  complex  circular  white  Gaussian  noise 
with  variance  cr^ .  We  can  rewrite  (7)  as 

3^n)  =  z(n)[l  +  w(n)/z{n)]  =  z(/i)[l  +  v(«)]  ,  (9) 

where  v(n)  is  probabilistically  equivalent  to  w{n)  ,  that 
is,  white  and  Gaussian,  with  moments  given  by 

£'|[v(«)]  I  = 

£j[v(«)l*[v»l'j=0,  (10) 

£j  [v(«)l  *  [v*(/7)]  *  j  =  A: !  =  0,1,2, . . . 

The  following  two  formulas  will  be  needed  later: 

£j[i-K")]  [*«■(”)] )  =  g 

The  noise,  being  Gaussian,  has  bounded  moments.  Let 
{X„}  be  a  sequence  of  (real  or  complex)  random 

variables,  and  {a„}  a  sequence  of  positive  real  numbers. 
We  will  use  the  notation  X„  =  0„(l)  to  mean  that  all  the 
moments  of  A"„are  bounded  uniformly  in  n,  that  is:  for 
every  positive  integer  there  exists  a  positive  constant 
B(it)  such  that  E\X„f  <  B{k)  for  all  n.  The  notation 
X„  =  0„{a„)  will  mean  that  =  0„(l) . 

Both  the  HAF  and  the  PWVD  algorithms  search  for  the 
maximum  point  of  a  discrete  Fourier  transform  (DFT). 
We  therefore  need  a  formula  expressing  the  perturbation 
of  the  maximum  point  of  the  DFT  as  a  function  of  the 

measurement  error.  Let  cOq  e[— 7r,7r  j  and 

y{n)  =  eJ‘’^ +a{n),  (13) 

where  {«(»)}  is  some  additive  interference.  Let 
r(ffl)  denote  the  DFT  of  ><n) ,  also  let  denote 


the  kth  derivative  of  the  DFT  of  a{n)  with  respect  to  © 
(with  k  =  0  denoting  the  DFT  itself).  Introduce  the 
following  assumptions: 

•  Assumption  A:  ^(m)  =  G,„(l) ; 

•  Assumption  B:  '*^)  • 

Under  assumptions  A  and  B  one  can  show  [3]  that 

where  m  is  the  point  of  local  maximum  of  |t(®)1  • 


2.  2.  The  error  formula  for  the  PWVD 


As  mentioned  before,  the  PWVD  algorithm  include 
maximization  of  the  DFT  of  the  signal  (9)  after  the 
application  of  the  transformation  (5).  The  discrete 
version  of  the  transformation  kernel  is 

K^f{n,m)  =  Y^y{n+c^m)\''\y[n  +  c_,m)\  .  (15) 

Jfc=0 

Passing  the  input  signal  (9)  through  the  above  kernel, 
yields  the  signal 

A'-'''  [n,m)  =  exp|  jlKtn  /F(n) j  [  1  +  a„(»i)] ,  (16) 

where  is  defined  by 

1^ 


Jfc=0 

|^(l+v(n +c.*/n))  j  -1 

and  IF{n)  is  the  instantaneous  frequency  at  time  n. 
The  mean  of  a„[m)  is 

1 4 

F\a„{m)]^E  Hi !  +  ''(« ‘ 


(17) 


U+v(n+c_*m)j  -1 


(18) 


For  the  case  m^Q,  the  two  terms  in  the  product  are 
independent,  and  the  mean  can  be  expressed  as 

9^2  1  4 


4.» 


(19) 


-1 


E  (l  +  v(/7 +c.*/n))  j 

By  (11) 

1  +  4-  Cj^  m  )i‘‘  n+c.*ffi))  j  =1,(20) 

and  £[o„(w)]  =  1-1  =  0. 
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Next  we  need  to  prove  that,  a„[m)  meets  the 
requirements  of  assumption  A,  Each  moment  of  is 
a  finite  sum  of  moments  of  v(n).  Since  v(n)  is  0„(l), 
any  finite  sum  of  its  moments  is  bounded.  This  leads  to 
the  conclusion  that  a„{m)  is  also  0„(l)  and  a„(/n) 
meets  the  requirements  of  assumption  A. 

We  will  now  prove  that  «„(/»)  also  meets  the 
requirements  of  assumption  B.  Denote 

=  ,  (21) 
m=I 

then 

•  (22) 

m=l 

We  will  define  a„{m)  to  be 

=  (23) 

Notice  that  a„[m)  and  S'„(/w)  are  probablistically 

equivalent.  For  the  case  v/e  have 

E[a„{m)a„{l)\  =  0  (24) 


=  (25) 


The  sequence  1<  ^  <  A/'j  is  zero  mean  i.i.d. 


Hence 


]  ^  kwf 


4K>K)r]=™^,o(, ^ 

Due  to  the  fact  that  all  moments  of  are  bounded 

~  meets  assumption  B. 

Now  that  we  have  proved  that  a„{m)  meets  all  the 

requirements  of  (14),  we  can  now  apply  (14).  This  yields 
the  following  error  formula 


Sa  =  12A?‘^9l 


I  A^'\cu,)  -70.5A^[  ‘|  +  0„(n-^) 


«12Af-’3]£;(«-0.5A^)a(«) 

U=i 

The  variance  of  the  error  is 


£•[  fiw'  ]  =  E  144A-'3|  -  o.5N)a„{n)  I 

^  J‘  .  (28) 

3|^(w-o.5A^)a„’(»i)| 

We  will  use  the  equality 

Z'[3{x}3{3.}]  =  0.59i[£[x/]-£[x>.]]  ,  (29) 

lo  get 

N  N 

e\  8co^  =  11N~^  Y.n^-05N){m-05N) 

.  (30) 

Since  a„[m)  is  zero  mean  i.i.d,  the  cross-terms  vanish 
and  (30)  becomes 

N 

e\  =72A^“*^(»2-0.57V)^ 

-=■  .  (31) 

e[ 

In  order  to  get  a  closed  form  error  formula  we  need  to 
develop  an  estimate  for  «„(/«)  a„*(7M)j  and 

■ 

n^a„{m)a^'{m)^  =  E  j^IT[  1  +  v{n  +  Cj/n)]  ‘‘ [(>  +  '(  -”))■] ‘“-i]  (32) 

n  +  Cj^  m))‘]  ‘[l  +  v(  n+ 

Using  (11)  and  (12),  (32)  simplifies  to 

r  1  f  **  CZ)  V  'l 

£  “.('"jo.'w  =n  s  7 

i-=n  .^n  \*  y 


I  r*  -1. 


Following  a  similar  route,  we  can  show  that 

=  (34) 

Inserting  (34)  and  (33)  in  (30)  leads  to 

E[Sco^\  =  llN-^E^^.f^{n-0.5Nf  ,  (35) 

n=\ 

where 

(bV  V  fh  V  '\ 

zH^-i.  (36) 

k=o  \  /=o  v»  y  j  ,=o  \»  /  ^ 

For  the  selection  =b_i^,  E^^.  is 
?/2  r  A.  y  y 

^..■=n  s  ‘  h'-"  -1.  (37) 

*=o\i=ov  y 
li  is  easy  to  prove  that 
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|;(«+o.5;^)'=— a^^+o(a^')  (38) 

B=1 

Using  (38),  we  can  derive  the  following  estimate  of  (35), 

(39) 

3.  Comparison  of  the  two  algorithms 

In  this  section,  we  compare  the  variance  of  the  IF 
estimate  of  both  algorithms.  The  comparison  is  done 
using  the  error  formulas  derived  above.  We  will  also 
compare  the  complexity  of  the  two  algorithms. 

In  Figure  I,  we  compare  the  variance  of  the  IF 
estimates  at  the  middle  of  the  observation  interval.  We 
can  see  that  the  variance  of  the  HAF  estimate  is 
consistently  lower  than  that  of  the  PWVD  estimate. 

In  Figure  2,  we  compare  the  performance  of  the 
algorithms  at  points  other  than  the  midpoint  of  the 
interval.  We  can  see  that  while  the  HAF  algorithm 
performs  quite  well  for  a  large  part  of  the  observation 
interval,  the  PWVD  estimates  degrade  quite  rapidly  as 
we  move  away  from  the  midpoint  of  the  interval. 

In  Table  1,  we  compare  the  derived  error  formula  for 
the  PWVD  to  simulation  results.  We  can  see  that  the 
analytic  estimates  agree  quite  well  with  the  simulations. 


Figure  1-  The  variance  of  the  IF,  estimated  in  the 
middle  of  the  sampling  interval,  as  a  function  of 
input  noise  variance.  Phase  polynomial  order  = 
4.  (a)  PWVD  (b)  HAF  (c)  CRB. 

The  computational  complexity  of  the  two  algorithms 
is  about  the  same  as  far  as  IF  estimation  is  concerned. 
However,  the  PWVD  requires  a  preliminary  step  of  data 
resampling  (interpolation),  which  increases  its 
computational  complexity.  In  addition,  the  HAF 
algorithm  enables,  after  the  polynomial  parameters  are 
computed,  to  estimate  the  IF  at  any  desired  point  in  the 


interval.  The  PWVD,  on  the  other  hand,  requires 
repetition  of  the  entire  procedure  for  each  new  time 
point. 


Figure  2-  The  error  variance  in  the  estimation  of 
the  IF  as  a  function  of  n/N  .  a*=0.1;  (a)  PWVD  (b) 
HAF  (c)  CRB. 

Table  1-  Comparison  between  the  estimate  of 
the  error  variance  using  (39)  and  simulation 
results,  for  various  values  of  noise  variance  and 
for  various  window  sizes. 


N 

o^=0.1 

a^=0.01 

a^=  0.001 

512 

8.8612 

0.6268 

0.0531 

1024 

8.7624 

0.6400 

0.0514 

2048 

8.6891 

0.5771 

0.0589 

Estimated 

8.6391 

0.6226 

9.0602 
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Abstract 

The  aim  of  this  work  is  the  parameter  estimation  of 
polynomial-phase  signals  (PPS)  embedded  in  white  noise. 
The  proposed  estimation  algorithm  is  a  generalization  of 
the  method  based  on  the  Polynomial-Phase  Transform 
(PPT),  able  to  solve  an  ambiguity  problem  appearing  when 
applying  the  PPT  to  PPSs  having  the  same  highest  order 
phase  coefficients.  The  proposed  approach  is  based  on 
the  intersection  of  two  (or  more)  signal  subspaces  sharing 
only  the  useful  components  but  not  the  undesired  spuri¬ 
ous  harmonics.  Different  signal  subspace  are  obtained  by 
exploiting  a  multilag  definition  of  the  PPT. 

1.  Introduction 

The  aim  of  this  work  is  the  parameter  estimation  of 
multicomponent  polynomial-phase  signals  (MC-PPS)  em¬ 
bedded  in  additive  white  noise  (AWN).  The  proposed  ap¬ 
proach  is  a  generalization  of  the  qjproaches  proposed  in 
[6],  [7]  (see  also  [9],  Chapter  12).  The  case  of  MC-PPS 
was  already  analysed  in  [8]  and  [10]  using  the  Polynomial- 
Phase  Transform,  introduced  in  [7],  later  called  High  or¬ 
der  Ambiguity  Function  (HAF)  [9].  The  HAF  allows  the 
parameter  estimation  of  multicomponent  polynomial-phase 
signals,  thus  jffoviding  a  clear  advantage  with  respect  to 
alternative  techniques  based  on  the  computation  of  the  in¬ 
stantaneous  phase,  followed  by  polynomial  fltting  [4],  not 
able  to  deal  with  multicomponent  signals.  Of  course,  being 
nonlinear,  the  HAF  suffers  from  the  presence  of  cross-toms 
when  ^plied  to  multicomponent  signals.  In  general,  as  the 
number  of  samples  increases,  the  effect  of  the  cross  toms 


diminishes.  Howevo  when  the  signal  components  share 
the  same  highest  wder  phase  coefficients,  the  HAF  exhibits 
spurious  peaks  that  make  the  detection  and  parameter  esti¬ 
mation  ambiguous  [1].  In  such  a  case  even  the  increase  of 
the  number  of  samples  does  not  provide  any  help  to  remove 
the  ambiguity.  Since  this  situation  is  common  to  a  numbo 
of  £q>plications  where  the  polynomial-phase  modelling  can 
be  of  interest,  like  Synthetic  Aperture  Radar  (SAR)  signal 
processing  [11]  ot  communications  over  channels  affected 
by  multipath  propagation  [3],  it  is  important  to  provide  an 
accurate  analysis  of  the  ambiguity  problem  together  with  a 
possible  solution.  In  this  work  we  propose  a  solution  based 
on  an  algebraic  approach  that  exploits  the  redundancy  of 
the  Multi-Lag  HAF  (ML-HAF),  introduced  in  [2].  The  de¬ 
grees  of  freedom  related  to  the  choice  of  the  lags  present 
in  the  ML-HAF  can  be  exploited  to  solve  the  ambiguity 
problem.  In  this  work  we  will  show  how  to  take  advantage 
of  these  degrees  of  freedom  using  an  algebraic  q)proach 
based  on  the  projection  of  the  observed  signal  onto  a  signal 
subspace  obtained  as  the  intersection  of  signal  subspaces 
estimated  using  different  sets  of  lags. 

The  paper  is  organized  as  follows.  In  section  2  we  will  re¬ 
view  the  multilag  HAF  and  the  associated  ambiguty  prob¬ 
lem.  In  section  3,  we  will  describe  the  Signal-Subspace 
Intersection  (SSI)  method.  Section  4  shows  some  perfor¬ 
mance  obtained  by  simulation. 

2.  The  multilag  instantaneous  high  order  mo¬ 
ment  and  the  ambiguity  problem 

Given  a  discrete-time  signal  s{t),  with  f  =  0, . . . ,  T- 1, 
its  M-th  order  multi-lag  High  ordCT  Instantaneous  Moment 
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(ml-HIM)  SM{t-,Tu...,TM-i)  of  s{t)  is  defined  by  the 
following  iterations  [2]: 

Si(t)  =  s{t),  S2(t;  Ti)  =  Si  (t  +  Ti)s*i(t  -  Ti), . . . 

SAf(t;  Tl)  •  •  =  SM-l(i  +  TM-l',Tl>  ■  •  •  )TM-2) 

(1) 

We  use  the  acnMiym  ml-HIM  as  a  generalization  of  the 
HIM,  introduced  in  [12].  The  PPT  [7]  or  HAF  [9]  is 
defined  as  the  Fourier  Transfram  of  the  ml-HIM,  with  re¬ 
spect  to  t,  in  the  particular  case  in  which  the  lags  r*.  are 
all  equal  to  each  other.  When  applied  to  multicomponent 
polynomial-phase  signals,  the  ml-HIM  contains  both  auto¬ 
terms  (the  useful  terms)  and  cross-terms.  In  particular, 
given  an  input  signal: 

sit)  =  ^2  (2) 

fci=i 

the  ml-HIM  satisfies  the  following  prc^rties  (the  proofs 
are  given  in  [3]); 

Prop.  1:  The  auto-terms  are  complex  sinusoids  whose 
frequencies  are:  fk  =  2*^“^  fw  A:  =  1, 

•••,  K. 

Prop.  2:  If  the  PPS  components  share  the  highest  order 
polynomial-phase  coefficients  iak,i  =  at,  for  I  >  m  and 
m  >  1),  the  ml-HIM  contains  spurious  harmonics,  besides 
the  useful  harmonics  given  by  the  auto-terms.  In  particular, 
if  the  PPS  crMnponents  share  all  the  phase  coefficients  firom 
the  second  up  to  the  Af-th  order,  the  ml-HIM  cmitains  only 
harmcmics,  whose  frequencies  are: 

M-l  2"-^ 

fk  =  Ml  ]][  Ti  OM  +  (Ofci.l  - 

t=l  *=1 

(3) 

Prop.  3:  If  the  PPS  cmnponents  have  the  same  highest  or¬ 
der  phase  coefficients  iak,i  =  ai,  for  k  >  m  and  m  >  1), 
the  only  sinusoids  pesent  in  the  ml-HIM,  whose  frequency 
is  proportional  to  the  product  of  all  the  lags,  have  a  fre¬ 
quency  fk  =  2*^“^  riili  ^  Tiak,M. 

From  these  properties,  it  is  possible  to  envisage  the  parame¬ 
ter  estimation  technique.  Accwding  to  Prop.  1,  if  the  input 
signal  is  a  polynOTiial-phase  signal  of  degree  M,  its  M-th 
(vdo*  ml-HIM  is  a  sinusoid  whose  frequency  is  px^xwtional 
to  the  highest  order  phase  coefficient  of  the  signal;  the  es¬ 
timation  of  the  highest  wder  phase  coefficeint  can  then 


be  recast  as  a  caiventional  frequaicy  estimation  problem 
which  can  be  solved  using  FFT-based  methods,  as  in  [7],  or 
using  signal-subspace  projectiwi  methods,  as  anticipated  in 
[6].  Once  the  highest  order  coefficient  has  been  estimated, 
the  degree  of  the  signal  polynomial  phase  can  be  lowered 
by  multiplying  the  input  signal  by  carp[-j27rdJ^f^*^/M!], 
where  om  is  the  estimated  coefficient  If  the  estimation  is 
correct,  the  degree  decreases  and  the  process  can  be  iterated 
to  estimate  the  lower  order  coefficients.  Indeed,  Pri?).  2 
reveals  the  existence  of  spurious  sinusoids,  when  the  input 
signal  is  composed  by  PPSs  having  the  same  highest  or¬ 
der  i*ase  coefficients.  However,  using  Prc^.  3  is  possible 
to  solve  the  ambiguity  iMX)blem.  In  [1]  the  ambiguity  was 
eliminated  by  multiplying  the  Fourier  transforms  (properly 
scaled)  of  the  ml-HIMs  cwresponding  to  different  sets  of 
lags.  In  this  work,  we  prqiose  an  algebraic  aj^noach,  po¬ 
tentially  able  to  provide  better  resolutions  than  the  non- 
parametric  FFT-based  ^proach. 

3.  The  Signal-Subspace  Intersection  Method 

The  freedom  in  the  choice  of  the  lags  used  fw  cranputing 

the  ML-HIM  can  be  properly  exploited  to  remove  the  ambi¬ 
guity.  Let  us  compute  the  ML-HIMs  sm  (t;  . . . ,  j) 

corresponding  to  L  different  sets  of  lags  ' ’s,  for  I  = 

I,  2, . . . ,  L.  We  will  assume  that  each  pair  of  sets  sat¬ 
isfies  the  amdition:  cl  =  I J J \[k=\ 

with  3  and  where  I  and  J  are  integer  numbers.  Ac¬ 
cording  to  Prop.3,  if  the  input  signal  CMitains  K  PPSs 
of  degree  M ,  the  ml-HIMs  contain  sinusoidal  auto-terms 
whose  frequencies  are  related  by  the  following  relationship: 
/W  =  I  f^'>  I  j,  having  indicated  by  f^^  the  frequency  of 
the  fc-th  auto-term,  k  =  corresponding  to  the 

i-th  set  of  lags,  i  =  1, . . . ,  L.  To  compare  different  ml- 
HIMs,  is  in  general  necessary  to  resample  them.  More 
specifically,  to  cwnpare  the  generic  i-th  with  the  j-th  ml- 
HIMs,  the  i-th  ml-HIM  has  to  be  downsampled  by  a  factw 

J,  whereas  the  j-th  ml-HIM  has  to  be  downsampled  by 
a  factw  I.  AftCT  downsampling,  accwding  to  Pr(^.3,  the 
two  ml-HIMs  share  some  sinusoids  in  common:  the  si¬ 
nusoids  corresponding  to  the  auto-terms.  To  extract  the 
infcxmation  about  the  cOTimrai  sinusoids,  we  can  then  use 
an  algebraic  approach  based  on  the  estimation  of  the  sig¬ 
nal  subspaces  associated  to  different  ml-HIMs  and  on  their 
intersection.  The  algorithm,  denoted  Signal-Subspace  In- 
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tersection  (SSI)  algorithm  is  the  following  (the  algorithm 
is  described  in  the  case  of  two  sets  of  lags,  for  simplicity, 
but  the  generalization  to  L  sets  is  straightforward): 

1.  Compute  the  ml-HIMs  i=l,  2,  for  two  different 
sets  of  lags  satisfying  condition  cl,  with  given  values 
of  I  and  J; 

2.  hi  I  ^  J,  resample  the  ml-HMs; 

3.  Estimate  the  covariance  matrices  corresponding 
to  the  two  ml-HMs; 

4.  Compute  the  Singular  Value  Decomposition  (SVD)  of 
each  covariance  matrix: 

5.  Select  the  common  order  d  as  the  greatest  value  be¬ 
tween  the  orders  estimated  from  the  singular  values  of 
each  covariance  matrix; 

6.  Estimate  the  signal  subspaces  as  the  spaces 
spanned  by  the  columns  of  the  matrices  defined 
as 

S(‘)  =  [u(l)«,...,u(d)W]  (4) 

where  denotes  the  fc-th  column  of  for 

i  =  1, 2; 

7.  Compute  the  intersection  between  the  subspaces 
and  (see  [5]): 

(a)  Compute  A  = 

(b)  Compute  the  SVD  of  A:  A  =  Y  S 

(c)  Select  order:  S  is  a  diagonal  matrix  whose  en¬ 
tries  are  the  cosines  of  the  principal  angles  be¬ 
tween  the  two  subspaces  and  [5].  If 
we  order  the  cosines  in  a  decreasing  order,  the 
dimension  d  of  the  intersection  subspace  can  be 
estimated  as  the  index  such  that  the  following 
inequalities  hold:  cos(0i)  =  ...  =  cos(0d)  = 

1  >  cos(0d+i); 

(d)  Estimate  the  intersection  space  as  the  space 

spanned  by  the  matrix  E  defined  as  follows: 
E  =  •  [y(l) . . .  y(d)]  where  y{k)  stands  for 

the  A:-th  column  of  the  matrix  Y); 

8.  Estimate  the  pseudo-spectrum  as  the  square  norm  of 
the  projection  of  the  steering  vector  e(u;)  onto  the 
intersection  subspace:  p(u>)  =  e^{u})  E  e{u)  where 


e(a;)  =  (1,  . . . ,  ^)")  and  Nf  is  the  number 

of  samples  on  the  frequency  axis. 

4  Performance 

As  a  comparison  term,  Fig.s  1  and  2  show  the  HAF 
and  the  pseudo-spectrum,  obtained  using  the  SSI  algo¬ 
rithm,  of  a  signal  composed  by  the  sum  of  two  cubic-phase 
signals  having  the  same  amplitude  and  phase  parameters: 
ai.i  -  0.125,01,2  =  0.25/Ar,oi,3  =  0.25/^2,02,1  = 
0.5, 02,2  =  0.5/Ar,  02,3  =  01,3;  the  number  of  samples  is 
1440;  two  sets  of  lags  have  been  used:  =  240 

and  rf )  =  240,  rf  ^  =  120  (so  that  /  =  2  and  J  =  1). 
In  Fig.l  we  can  clearly  see  three  peaks,  two  of  which  (the 
lateral  ones)  are  spurious  peaks.  Conversely,  the  pseudo¬ 
spectrum  exhibits  only  one  peak.  In  the  presence  of  noise 
and  dealing  with  finite  length  sequences,  the  estimation  of 
the  order  d  at  step  7c)  in  the  SSI  algorithm  can  be  done  by 
using  a  threshold  because  the  cosines  of  the  principal  angles 
are  random  variables  themselves.  From  simulation  (using 
1000  Montecarlo  runs),  we  have  obsaved  that,  in  the  case 
of  the  signal  analyzed  in  Fig.l,  embedded  in  white  noise 
(SNR=10  dB),  the  average  values  of  cos(0i)  and  cos(02) 
are  0.9991  and  0.7862,  respectively,  and  the  corresponding 
standard  deviations  are  5.7e-4  and  5.8e-2.  Therefore  the 
two  random  variables  are  well  separable,  for  SNR=10  dB. 
The  performance  of  the  method  in  the  presence  of  noise 
have  been  evaluated  by  computer  simulations.  In  particu¬ 
lar,  Fig.3  shows  the  standard  deviation  of  the  estimate  of 
the  third  order  coefficient  as  vs.  the  input  SNR.  The  input 
signal  is  the  same  as  the  one  analyzed  in  Fig.s  1  and  2, 
plus  white  noise.  From  Fig.3,  we  can  see  that  the  variance 
of  the  HAF-based  method  does  not  decrease  as  the  SNR 
increases,  due  to  the  ambiguity  problem. 

5.  Conclusion 

In  this  paper  we  have  proposed  an  estimation  algorithm 
able  to  remove  the  ambiguity  related  to  the  HAF  when  ap¬ 
plied  to  multi-component  signals  having  the  same  highest 
order  phase  coeffici^ts.  The  price  paid  by  the  proposed 
algorithm,  besides  the  higher  computational  cost,  is  that 
the  estimation  variance  is  greater  than  that  achievable  with 
the  HAF,  if  the  ambiguity  could  be  properly  removed.  The 
higher  error  is  due  to  errors  in  the  estimate  of  the  covariance 
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matrices,  arors  which  propagate  in  the  estimation  of  the 
signal  subspaces  and,  as  a  consequence,  on  the  intersection 
subspace.  Further  analyses  are  necessary  to  optimize  the 
SSI  algorithm,  in  terms  of  size  of  the  covariance  matrix, 
number  of  intersections,  all  parameters  that  greatly  affect 
the  final  performance.  It  is  important  to  outline  that  the 
intersection  idea  could  be  extended  to  different  signal  pro¬ 
cessing  problems,  whenever  is  possible  to  set  up  different 
experimaits  where  only  the  useful  signals  are  in  common. 


0.2  0.9  0.4 


Figure  1 :  HAF  of  the  sum  of  two  cubic  phase 
signals. 


Figure  2:  SSI  pseudo-spectrum  of  the  same 
signal  as  in  Fig.1. 
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Abstract 

In  this  paper  we  introduce  and  analyze  the  so  called 
Complex  Sign  WVD  (CS-WVD),  defined  as  the  Wigner- 
Ville  Distribution  (WVD)  where  one  of  the  two  signals  is 
substituted  by  its  complex  sign.  The  substitution  provides 
a  consistent  simplification  for  the  implementation  on  ded¬ 
icated  hardware.  In  particular,  the  number  of  multiplica¬ 
tions  is  drastically  reduced.  In  spite  of  the  hard  nonlin¬ 
earity  used  in  the  CS-WVD,  the  new  transform  is  still  able 
to  deal  with  multi-component  chirp  signals.  In  the  paper 
we  provide  a  statistical  analysis  of  the  introduced  transfor¬ 
mation,  in  the  case  of  polynomial-phase  signals  embebbed 
in  additive  white  Gaussian  noise.  The  theoretical  analysis 
is  compared  to  simulation  results  and  to  the  Cramir-Rao 
lower  bounds. 

1  Introduction 

Time-frequency  distributions  (TFD)  such  as  the  Wigner- 
\4Ile  Distribution  (WVD)  are  particularly  suited  for  the 
analysis  of  Linear  Frequency  Modulation  (LFM)  signals 
[4],  [1].  The  WVD  is  known  for  its  good  localization 
properties,  but  it  suffers  from  high  cross-terms.  A  more 
general  family  of  distributions,  namely  the  Cohen’s  class 
[3],  has  been  introduced  for  designing  TFDs  showing  a 
good  compromise  between  resolution  and  cross-terms  [3]. 
Indeed,  the  members  of  Cohen’s  class  can  all  be  expressed 
as  smoothed  versions  of  the  WVD.  In  this  work  we  will 
concentrate  on  the  WVD,  but  the  proposed  approach  can 
be  directly  extended  to  the  general  case.  One  of  the  incon- 
venients  related  to  the  WVD  is  its  higher  computational 


cost  with  respect,  for  example,  to  the  Short-  Time  Fourier 
Transform  (STFT).  In  this  papa  we  will  introduce  the 
Complex  Sign- WVD  (CS-WVD)  aimed  to  drastically  re¬ 
duce  the  number  of  complex  multiplications  necessary  to 
compute  the  WVD.  The  use  of  the  complex  sign  introduces 
a  performance  loss,  but  gives  rise  to  a  transformation  which 
is  much  easier  to  implement,  especially  on  dedicated  hard¬ 
ware,  because  it  simplifies  the  scaling  problem  and  does 
not  require  any  multiplication  for  computing  the  transfor¬ 
mation  kernel  (the  product  of  the  signals).  The  aim  of  this 
paper  is  the  statistical  analysis  of  a  method  for  estimating 
the  parameters  of  polynomial-phase  signals,  based  on  the 
CS-WVD.  In  spite  of  the  hard  nonlinearity  introduced  in 
the  transformation,  the  proposed  method  is  still  able  to  deal 
with  multicomponent  signals,  if  the  number  of  samples  is 
sufficiently  high.  Indeed  we  will  prove  that  the  CS-WVD 
of  LFM  signals  tends  to  coincide  with  the  WVD  of  the 
same  signals,  as  the  number  of  samples  increases.  The  pa¬ 
per  is  organized  as  follows.  In  Section  2  we  will  define  and 
give  some  examples  of  the  CS-WVD.  We  will  also  show 
the  asymptotic  properties  of  the  CS-WVD.  In  Section  3  we 
will  give  a  statistical  analysis  of  the  CS-WVD  in  the  pres¬ 
ence  of  additive  white  Gaussian  noise  (AWGN).  Finally,  in 
Section  4,  we  will  analyze  a  parameter  estimation  method, 
for  polynomial-phase  signals,  based  on  the  CS-WVD. 

2  Complex-Sign  Wigner-Ville  Distribution 

In  this  section,  we  will  introduce  the  CS-WVD  and  an¬ 
alyze  its  asymptotic  properties,  as  the  number  of  samples 
tends  to  infinity. 
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2.1  Definition  and  examples 


[q.e.d.] 


The  Wigner-ViUe  Distribution  of  an  infinite  length  sig¬ 
nal  s{t)  is  defined  as  [3]; 

T 

WVDsit,  f)  :=  lim  ^  f  s(t  +  r)s(t  -  r 

-T 

We  define  the  Complex  Sign-WVD  (CS-WVD)  of  a  signal 
s{t)  as; 

CSWs{t,f):=  lim  /  s{t+T)csign{s{t-T))e-^*^^'^dT 

T— ^ooi  J-T 

(2) 

wheie  the  complex  sign  of  a  complex  variable  2:  is  defined 
as: 

csign{z)  **=  ~^sign{^{z))  +  j-^sign{^{z))  (3) 

and  the  overbar  denotes  conjugation.  The  use  of  complex- 
sign  makes  the  computation  of  the  kernel  multiplication 
fiee. 

Indeed,  we  can  prove  the  following  theorem; 

Theorem;  Given  a  LFM  signal  s{t)  =  = 

^gj27r(ao-(-ait+a2t2)^  jjg  CS-WVD  tends  to  be  proportional 

to  its  WVD,  as  the  number  of  samples  tends  to  infinity: 

lim  CSWs{t,  f)  =  -^WVD,{t, f)  (4) 

T-.00  V  2.471 

Proof.  Using  the  Fourier  series  expansion  of  the  complex- 
sign  of  Ae^’^\ 

csign\A^‘^]  =  -^ 

^  n=l 


Examples  of  application  of  the  CS-WVD  to  linear  fre¬ 
quency  modulation  (LFM)  signals  are  shown  in  Fig.l,  for 
a  monocomponent  signal,  and  in  Fig.2  for  two  LFM  com¬ 
ponents.  We  can  observe  that,  in  spite  of  the  hard  non¬ 
linearity,  the  CS-WVD  still  allows  the  detection  of  chirp 
signals,  even  in  the  multicomponent  case.  This  capability 
improves  as  the  numba  of  samples  increases.  Figs.  1  and 
2  have  been  obtained  using  a  number  of  samples  N  =  128. 


frequency  ^  ime 


Figure  1 :  CS-WVD  of  an  LFM  signal  with  ai  = 

0.25  and  02  =  0.25/N. 


we  can  single  out  the  first  term,  thus  obtaining: 

CSWsiL  f)  =  ^WVD,{t,  f)+  (6) 

We  can  prove  that  the  second  term  in  (6)  is  null.  In  fact, 
fOT  LFM  signals  all  the  integral  arguments  are  quadratic- 
phase  functions,  whose  second  order  coefficient  is  certainly 
different  firom  zero,  fw  n  >  1.  Since 


e-Jf =  1 


3, 


(7) 


all  the  integrals  give  a  finite  result.  Therefore,  the  limit  in 
the  second  term  of  (6)  is  equal  to  zero. 


frequency  °  ®  ime 


Figure  2:  CS-WVD  of  the  sum  of  two  LFM  sig¬ 
nals  having  ai,i  =  0.25,  02,1  =  0.75,  ai^  = 
0.25/iV,  and  02,2  =  -0.25/iV. 
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3  Statistical  analysis 


A  statistical  analysis  of  the  CS-WVD  of  chirp  signals,  in 
the  presence  of  Additive  White  Gaussian  Noise  (AWGN), 
has  been  carried  out  computing  expected  value  and  vari¬ 
ance.  In  particular,  referring  to  its  discrete-time  version  and 
to  the  case  of  a  limited  number  of  samples,  the  CS-WVD 
takes  this  form: 

N 

CSWx{n,f)  =  y^^x{n+k)csign{x(n—k))e~^^'"^'°  (8) 

k=l 

where  x{n)  =  s{n)  +  w{n),  s{n)  is  the  useful  signal  and 
ti;(n)  is  AWGN.  The  moments  of  the  CS-WVD  can  be 
computed  using  the  cumulant  series  expansion  introduced 
in  [5].  UndCT  the  hypothesis  of  Gaussian  random  vari¬ 
ables,  the  expansion  is  greatly  simplified  because  all  the 
cumulants  of  order  higher  than  two  are  equal  to  zero.  In 
particular,  the  expected  value  of  the  CS-WVD  is: 

N 

E{CSWx{n,  f)}  =  Y^s{n  +  k)R{n  -  (9) 

A;=l 

and  the  second  order  moment  is: 


E{\CSWx{n,f)f}=:  (10) 

+  *)«(«  +  l)R{n  -  k)Rin  -l)  +  -k)+ 

k  I 

+a6{l  -f  k)s{n  +  k)S{n  —  k)R{n  —  1) 

+a^6{l  +  k)s{Ti  +  l)R{n  -  k)S{n  - 1) 

+  k)S(n  -  k)S{n  - 


where  Q{x),  R{n)  and  S{n)  are  defined  as  follows: 


S(n)  = 
R(n)  = 
Qix)  = 


^(e  '^+e  i^)  (11) 

^(1  -  2Qi^))+j4il  -  2Q(^)X12) 

(13) 


4  Parameter  estimation 


Given  a  LFM  signal  embedded  in  AWGN,  we  wiU  now 
propose  a  method  for  estimating  the  phase  parameters  based 
on  the  CS-WVD.  We  initially  define  the  transformation: 

(9,  h)  =  Yl  CSWx  (n,  g  +  hn)  (14) 


=  ^^a:(n  -f  k)csign{x{n  —  k))e 
n  k 

which  provides  a  mapping  from  the  input  signal  onto  a 
plane  whose  axes  are  the  signal  mean  frequency  and  sweep 
rate.  For  each  chirp  we  observe  a  peak  in  the  plane  (g,  h). 
Therefore  detection  and  parameter  estimation  are  carried 
out  together:  if  a  peak  exceeds  a  suitable  threshold,  we  de¬ 
cide  for  the  presence  of  a  chirp  whose  parameters  are  the 
peak’s  coordinates.  The  overall  mapping  was  introduced 
in  [1].  The  method  is  asymptotically  efficient  and  provides 
a  good  rejection  capability  in  the  presence  of  multicompo¬ 
nent  signals.  Its  main  disadvantage  is  the  computational 
cost.  Once  again,  this  cost  can  be  reduced  by  resorting  to 
the  complex-sign.  This  possibility  was  already  proposed 
in  [2],  with  a  transformation  called  Hibrid-Nonlinear  In¬ 
tegrated  Generalized  Ambiguity  Function  (HNL-IGAF).  In 
[2]  the  performance  were  provided  by  simulation  results;  in 
this  work  we  present  a  theoretical  statistical  analysis  of  the 
HNL-IGAF,  based  on  the  paturbation  method,  thus  pro¬ 
viding  an  analytical  expression  for  the  variances  of  both 
firequency  and  sweep-rate  estimates,  valid  under  the  hy¬ 
pothesis  of  high  SNR. 

Given  x{n)  =  s{n)  +  w{n),  where  w(n)  is  AWGN,  denot¬ 
ing  by  6g  and  6h  the  estimation  errors  of  frequency  go  and 
sweep-rate  ho  respectively,  for  high  SNR  we  have  [6]: 


6g: 


Shi 


Bv  —  Cu 
{DC  -  J32) 

Bu  —  Dv 


{DC  -  B2) 

whem  5 


(15) 


(16) 


D^ej. 


V  =  — 47r^~^y^  knx{n  +  k)g{x{n  —  k))e 

n  k 


u  —  — 47ry^y^  kx{n  +  k)g{x{n  —  k))e  ^ 

n  k 

The  variance  of  the  estimates  of  the  two  highest  order  co¬ 
efficients  are  shown  in  Figs.3  and  4,  from  which  we  can 
observe  the  good  asymptotic  agreement  between  theoreti¬ 
cal  and  simulation  results  for  SNR  values  above  a  cer¬ 
tain  threshold  lOdB).  We  can  also  observe  that  the 
agreement  between  theory  and  simulation  increases  as  the 
number  of  samples  increases.  As  expected  a  saturation  ef¬ 
fect  exists.  The  main  price  paid  for  using  the  complex 
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Figure  3:  Variance  of  the  estimations  vs.  SNR 
of  the  frequency  go  obtained  by  using  HNL- 
IGAF:  theoreticai  anaiysis  (soiid  iine);  simu- 
iation  (+). 


Figure  4:  Variance  of  the  estimations  vs.  SNR 
of  the  sweep-rate  ho,  obtained  by  using  HNL- 
IGAF:  theoreticai  anaiysis  (soiid  iine);  simu- 
iation  (+). 


sign  is  a  kind  of  saturation  effect  at  high  SNR.  Indeed  the 
variance  of  the  estimation  decreases  more  slowly  than  the 
inverse  of  SNR,  as  predicted  by  the  Cramdr-Rao  Lower 
Bound  (CRLB).  As  regards  the  dependence  on  the  num¬ 
ber  of  samples,  the  estimate  of  the  m-th  order  polynomial 
phase  coefficient  is  characterized  by  a  CRLB  that  decreases 
as  the  inverse  of  jjj  Qyj  ^ase  this  behavior  if  ap¬ 

proximated  only  if  N  is  sufficiently  high. 

The  estimator  based  on  the  HNL-IGAF  can  be  proved  to 
be  consistent  In  fact,  considering  an  infinite  length  LFM 
signal,  we  can  prove  (see  [6]  for  the  analytical  details)  that 
the  expected  value  of  the  HNL-IGAF  of  an  LFM  embedded 
in  AWGN  tends  to  a  Dirac  pulse,  centered  on  the  signals 
parameters,  and  its  variance  tends  to  zero,  as  the  number 
of  samples  tends  to  infinity. 

5.  Conclusions 

In  this  work  we  have  proposed  and  analyzed  a  nonlinear 
method  for  analyzing  linear  fiequency  modulation  signals 
that  presents  some  advantages  for  implementation  on  dedi¬ 
cated  hardware,  because  it  strongly  reduces  the  number  of 
complex  multiplications  necessary  to  compute  the  Vfigner- 
\^lle  Distribution.  The  main  price  paid  for  the  simplifica¬ 
tion  is  a  performance  loss.  The  method  can  be  extended  to 
the  more  general  Cohen’s  class  of  time-frequency  distribu¬ 
tions  as  well  as  to  the  high  order  ambiguity  functions. 
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Abstract 

A  novel  method  for  the  determination  of  the  window 
parameters  of  adaptive  spectrogram  is  given  in  this  paper. 
It  is  based  on  the  detecting  the  maximum  value  and  the 
width  of  the  peak  of  the  Radon  transform  of  the  modulus 
of  the  ambiguity  function  of  the  signals*.  The  proposed 
method  effectively  reduces  the  cross-terms  and  the  noise 
of  linear  frequency  modulated  signals  compared  with 
Wigner  distribution  and  the  classical  fix  window 
spectrogram.  A  possible  further  extension  of  the  method  is 
also  given  to  fit  it  to  larger  classes  of  signals. 

I.  Introduction 

Time-frequency  distributions  (TFDs)  is  a  powerful 
tool  for  detection  and  analysis  of  time-varying  signals. 
They  have  found  use  in  various  fields  such  as  radar,  sonar, 
speech,  biomedicine  and  geophysics.  The  classical 
methods  such  as  Fourier  transform  can’t  provide  a  evident 
representation  of  the  relation  between  the  time  and 
frequency  content  of  the  signals.  Various  time-frequency 
analysis  methods  have  been  proposed.  The  TFDs  of 
Cohen’s  class  is  widely  studied[l].  The  most  prominent 
methods  among  which  are  the  Wigner  distribution(WD) 
and  the  spectrogram  (the  squared  magnitude  of  short-time 
Fourier  transform,  STFT).  Although  these  two  methods 
looks  very  different  in  their  behaviors  for  analysis  time- 
varying  signals,  they  can  be  interpreted  in  the  same  point 


*  This  work  is  supported  by  the  Institute  of  Electronics 
Researching  of  China , grant  no.  J94.0 1.0 1-9461 122  and 
DJ94. 17. 10-957 11 22. 


of  view,  i.e.  we  may  consider  the  Wigner  distribution  as  a 
special  STFT  which  use  the  signal  itself  as  the  window 
function[2]. 

The  cross-terms  among  multi-component  signal  is  a 
severe  limitation  on  the  use  of  WD.  It  is  a  result  when 
using  a  component  of  the  signal  as  the  window  which  is 
applied  to  the  other  components  of  the  signal.  This  effect 
can  easily  be  avoided  by  spectrogram  which  using  only 
one  window  in  any  time[2].  But  as  pointed  out  in  [3],  the 
choice  of  the  window  dramatically  affects  the  appearance 
of,  or  the  signal  concentration  in,  the  spectrogram  (or  the 
STFT).  We  must  face  the  tradeoff  between  time  and 
frequency  resolution  of  a  preselected  window. 

Several  authors  have  proposed  to  use  the  Gaussian 
function  with  variable  length  and  obliquity  which  best 
matched  to  signal  as  the  window  of  STFT[3]-[5].  It  was 
shown  that  this  method  greatly  improved  the  time- 
frequency  concentration  of  time-varying  signals.  The 
proposed  methods  for  determining  the  parameters  of  the 
window  are  either  computationally  expensive  or  need 
iterative.  In  this  paper,  we  propose  a  method  for 
determining  the  parameters  of  the  window  by  making  use 
of  the  Radon  transform  of  the  modulus  of  the  ambiguity 
function(AF)  of  the  signals.  A  systemic  method  for 
determining  the  parameters  is  given.  The  experimental 
results  show  this  method  greatly  improved  the  time- 
frequency  concentration  of  signals  even  in  low  signal  to 
noise  ratio(SNR)  situation.  A  further  extension  of  the 
method  is  also  given  in  the  fourth  section  of  this  paper. 

II.  Adaptive  Window  Parameters 
Determining  Procedure 

The  classical  spectrogram  using  a  fixed  low-pass 
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function  as  the  window.  It  can  be  considered  to  be  the 
order  zero  approximate  for  the  time-varying  signals  since 
it  use  a  constant  frequency  in  the  window  to  represent  the 
time-varying  frequency  content  of  the  signals.  So,  we 
must  face  the  time-frequency  tradeoff  in  the  selection  of 
the  window  length.  Also,  for  some  signals,  the  longer 
window  does  not  means  a  better  frequency  resolution, 
since  the  frequency  varying  in  the  window  may  be  large. 
If  we  allow  the  frequency  content  in  the  window  varying 
linearly,  i.e.  using  the  linear  frequency  modulating(LFM) 
window  instead  of  the  constant  frequency  window  as  the 


basis  to  represent  the  signals,  we  can  get  a  order  one 
approximation  of  the  time-varying  signals.  This  certainly 


results  a  more  precise  representation  of  the  frequency 
content  of  the  signals. 

The  spectrogram  of  a  signal  x(t)  is  defined  as: 

Specf(t,(0)  =  I  Jx(T)g(/  -  T)e'-'“Vx  (l) 


where  g(t)  is  the  window  of  spectrogram  (all  integral  are 
from  —00  to  00  unless  otherwise  stated).  Here  we  choose 
the  form  of  g(t)  as: 

=  (2) 

where  the  parameter  a  controls  the  aspect  ratio  (or  the 
length  of  the  window),  the  parameter  P  controls  the 
obliquity  direction  in  the  time-frequency  plane  (or  the 
frequency  modulating  rate).  These  are  the  two  parameters 
which  we  try  to  estimate  adaptively  for  different  signals. 
If  a  =0.5,  P  =0,  then  the  WD  of  g(t)  is  a  circle  in  the  (t, 
CO )  plane,  where  the  unit  of  t  is  in  second  and  CO  in 
radian/sec/sec.  Hence  we  consider  these  parameters 
defining  a  equal  resolution  window  in  (t,  CO  )  plane. 

As  stated  above,  in  any  segment  of  the  signals,  we 
may  better  represent  the  signal  using  the  window  as 
shown  in  (2)  than  using  the  classical  fixed  low-pass 
window.  But  the  condition  to  fulfill  this  goal  is  the  correct 
estimation  of  the  parameters  CX  and  P  .  In  the  following 
part  of  this  section,  we  try  to  estimate  these  parameters 
adaptively  from  the  AF  of  the  signals.  The  AF  of  a  signal 
x(t)  is  defined  as: 

AF^(y,x)=  \x{t  +  -  dt  (3) 


As  shown  in  [6],  the  modulus  of  AF  of  any  LFM 
signal  is  a  line  in  the  AF  plane  which  traverses  the  origin. 


By  calculating  the  Radon  transform  of  the  modulus  of  AF 
through  the  origin,  the  two  dimensional  function 
(v  ,T )  can  be  projected  into  a  one  dimensional 
function  P((p)  [6]  which  is  defined  as: 

P(cp)  =  91{|^F,(v,t)|}=  j|^i%(rsin(p,rcoscp)|c/r 


(4) 


where  9?  represents  the  Radon  transform,  r  is  the  radius 
and  cp  is  the  angle  of  the  polar  coordinate  of  the  AF  plane. 
The  range  of  9  is  0  <  9  ^  • 

We  first  assume  the  signal  under  analysis  is  of  the 
same  form  as  g(t)  shown  in  (2),  because  by  changing  the 
parameters  it  can  approximate  a  large  classes  of  signals.  It 
is  not  difficult  to  shown  that  the  AF  of  the  signal  x(t)  is  : 


AF^{\,x)  = 


4(a^+P^)T^-8pvn-v^ 


8a 


(5) 


Its  Radon  transform  can  be  proved  to  be: 

27t 

/>(9)  = 


^4(a^  -I-  p^)cos^9  -4psin9  COS9  +  sin^  9 


(6) 


By  differentiating  P(9)  respective  to  9  ,  it  is  also  easy 
to  find  the  value  of  9  which  makes  the  maximum 
P((p  )  and  satisfies  the  relation: 

4P 


tan29„„  = 


l-4fa^  -f-6') 


(7) 


For  signals  with  evident  time-varying  frequency  feature, 
the  time  duration  of  the  signal  must  be  long,  therefor  the 
CX  usually  be  small  compared  with  P  .  So  we  may 
neglect  a  in  (7).  This  results  an  estimation  of  P  as: 

(8) 

2 

So  by  finding  the  maximum  direction  of  P(9  ),  we  can 
estimate  the  parameter  of  P  by  (8). 

We  can  farther  use  the  -3dB  width  of  the  peak  of 
P(9 )  to  estimate  the  parameter  a  .  The  physical  mean  of 

this  estimation  is  based  on  the  observation  that  the  wider 
the  peak  of  P(9  ),  the  more  different  the  signal  from  the 

LFM  signal,  so  the  shorter  the  window  length  should  be. 
For  a  signal  without  frequency  modulation  ( P  =0  in  (2)), 
it  can  be  shown  that:  (for  a  «  0.5 ,  i.e.  signal  with  a 
long  duration) 
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a  = 


4 


(9) 


where  represents  the  -3dB  width  of  the  peak  of 

P((p ).  Although  this  estimation  is  achieved  without 
frequency  modulation  and  clearly  different  from  the  width 
of  signal  with  frequency  modulation,  the  experimental 
results  show  that  it  is  still  a  better  estimation. 

The  above  analysis  is  on  the  uniform  time-frequency 
plane,  i.  e.  the  unit  of  y  is  in  radn/sec  and  is  in  second. 
For  some  practical  applications,  the  uniform  coordinate 


may  results  to 


n 

2 


so  a  small  error  in  estimation  of 


(p  will  cause  a  grate  difference  in  p  .  In  this  case,  we 
should  use  the  nonuniform  coordinate  system  in  (v,T  ) 
plane.  Assuming  the  unit  of  y  is  in  k  radn/sec  and  T  is 
in  second,  the  P  should  be  estimated  as 


P 


2 


III.  Experimental  Results 


In  this  section  we  demonstrate  the  performance  of 
the  adaptive  spectrogram  using  the  parameters  estimated 
by  the  methods  which  are  stated  in  section  II. 

In  the  digital  implement  of  the  above  procedure,  the 
AF  is  calculated  from  discrete  samples.  In  the  calculating 
the  Radon  transform,  the  two  dimension  cubic 
interpolation  is  used  to  get  the  AF  value  not  in  the  discrete 
grid  of  the  rectangular  coordinate  system. 

The  first  example  examines  the  resolution  advantage 
of  adaptive  spectrogram  in  analyzing  two  chirp  signals 
with  same  frequency  modulating  rate.  The  envelopes  of 
the  signals  are  Gaussian  functions.  The  adaptive 
spectrogram  with  both  a  and  p  estimated  by  (9)  and  (8) 
is  shown  in  Fig.  1(a).  Fig.  1(b)  shows  the  spectrogram  with 
fixed  Ot  =0.5  and  P  estimated  by  (8).  For  comparison. 
Fig.  1(c)  and  (d)  show  the  WD  and  fixed  window 
spectrogram(a  =0.5,  P  =0)  respectively.  Please  note  the 
sampling  rate  of  WD  is  two  times  fast  than  that  of  the 
spectrogram.  It  is  evident  that  the  adaptive  spectrogram 
have  nearly  the  same  auto-component  concentration  as  the 
WD  but  without  its  cross-terms.  Compared  with  the 


classical  fixed  window  spectrogram,  the  resolution  of 
adaptive  spectrogram  is  much  higher. 

In  Fig.2,  we  show  the  same  signal  as  in  Fig.l  but 
now  with  additive  noise.  The  signal  to  noise  ratio  (SNR) 
is  IdB.  It  is  shown  that  the  adaptive  spectrogram  still 
resolve  the  two  signals  clearly  and  almost  without 
distortion,  while  the  signal  is  totally  embedded  by  noise  in 
its  WD  and  the  signal  is  greatly  distorted  by  classical 
spectrogram.  The  benefit  of  adaptive  spectrogram  comes 
from  the  fact  that  the  window  nearly  plays  the  role  of 
matched  filter  to  the  signals,  so  the  noise  effect  is  greatly 
reduced. 

IV.  Further  Extension 

One  obvious  limitation  of  adaptive  spectrogram  is 
that  it  most  suits  to  chirp  signals  with  nearly  the  same 
frequency  modulating  rate.  A  farther  extension  of  this 
method  is  currently  under  investigation  to  make  it  suit  to 
more  large  classes  of  signals.  For  examples,  if  two  chirp 
signals  with  different  frequency  modulating  rate  is  under 
analyze,  this  will  results  two  different  peaks  in  P((p  ).  In 

this  case,  we  may  determine  P  by  finding  the  weight 
center  of  P((p ).  This  oblique  direction  can  provide  a 
good  tradeoff  between  different  requirements  of  signals. 

It  is  also  easy  to  extend  the  adaptive  spectrogram  to 
signals  with  vary  frequency  modulating  rate.  We  can 
simply  repeat  use  the  adaptive  spectrogram  to  each  time 
segment  of  the  signals. 

V.  Conclusions 

In  this  paper,  we  proposed  an  adaptive  spectrogram 
for  analysis  the  time-varying  signals.  It  is  based  on  the 
Radon  transform  of  the  modulus  of  AF  of  signals  to 
determine  the  window  parameters.  Adaptive  spectrogram 
can  yield  excellent  results  over  Wigner  distribution  and 
classical  spectrogram,  especially  in  the  noise  background. 
The  conclusion  is  demonstrated  by  experimental  results. 
Since  the  adaptive  spectrogram  can  be  considered  as  the 
order  one  approximation  to  the  time-varying  features  of 
signals  compared  with  order  zero  approximation  of 
classical  fixed  window  spectrogram  and  can  be  used  to 
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any  time  segment  of  signals,  it  provides  a  more  precise 
model  to  time-varying  signals.  A  possible  extension  of  the 
adaptive  spectrogram  for  suit  to  larger  classes  of  signals  is 
also  given. 
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Figure  2;  The  same  signal  as  in  Fig.  1  but  with  addative 
noise,  (a)  Adaptive  spectrogram  with  both  a  and  p 


estimated,  (b).  Adaptive  spectrogram  with  a  fixed, 
(c).  Wigner  distribution,  (d).  Fixed  window  spectrogram 
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Figure  1.  (a)  Adaptive  spectrogram  with  both  a  and  p 
estimated,  (b).  Adaptive  spectrogram  with  a  fixed, 
(c). Wigner  distribution,  (d).  Fixed  window  spectrogram 
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Abstract 

By  equipping  the  base  stations  of  a  wireless  network  with 
antenna  arrays,  it  is  possible  to  more  fully  exploit  the  spa¬ 
tial  dimension  in  a  wireless  communication  system.  Mul¬ 
tiple  antennas  can  provide  a  processing  gain  to  increase 
the  base  station  range  and  improve  coverage.  Also,  by  ex¬ 
ploiting  the  spatial  selectivity  of  an  antenna  array,  interfer¬ 
ence  may  be  reduced  which  in  turn  can  be  traded  for  in¬ 
creased  capacity  of  the  system.  A  wide  range  of  wireless 
communication  systems  may  benefit  from  spatial  process¬ 
ing  including  high  mobility  cellular  systems,  low  mobility 
short  range  systems,  wireless  local  loop  applications,  satel¬ 
lite  communications  and  wireless  LAN.  By  employing  an  ar¬ 
ray  of  antennas,  it  is  possible  to  multiplex  channels  in  the 
spatial  dimension  just  as  in  the  frequency  and  time  dimen¬ 
sions.  This  is  often  referred  to  as  Spatial  Division  Multi¬ 
ple  Access  (SDMA).  To  increase  system  capacity,  spatially 
selective  reception  as  well  as  spatially  selective  transmis¬ 
sion  must  be  achieved.  Herein,  we  present  some  different 
approaches  and  techniques  for  spatial/temporal  processing. 
Critical  aspects  of  SDMA  for  both  high  mobility  cellular  sys¬ 
tems  and  low  mobility  or  movable  systems  will  be  reviewed 
and  the  potential  benefits  examined. 


1-  Introduction 

Wireless  communications  represent  an  important  area  of 
research,  ultimately  leading  to  the  development  of  new  and 
improved  services  and  products.  Substantial  improvement 
in  the  capacity  of  these  systems  is  a  key  issue  as  their  use 
becomes  more  wide  spread.  The  dramatic  expansion  of  mo¬ 
bile  communications  over  the  last  years  has  emphasized  the 
importance  of  efficient  use  of  frequency  bandwidth.  There 
is  an  increasing  demand  for  capacity  in  wireless  systems 
which  traditionally  directly  translates  into  a  demand  for 
more  bandwidth  which  is  quite  limited.  Also,  the  infrastruc- 
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ture  investment  costs  are  often  a  limiting  factor  when  de¬ 
ploying  a  new  system  aimed  at  wide  area  coverage.  Increas¬ 
ing  the  range  of  current  system  is  therefore  also  of  great  in¬ 
terest. 

There  are  two  critical  factors  in  the  design  of  wide 
area  mobile  communication  systems,  coverage  and  capac¬ 
ity.  These  factors  have  a  direct  impact  on  the  cost  and  qual¬ 
ity  of  the  services  since  the  spectral  resources  are  limited 
and  spectral  efficiency  is  necessary.  The  spatial  dimension 
is  to  a  large  extent  unexplored  in  wireless  systems.  Tra¬ 
ditional  telecommunication  schemes  multiplex  channels  in 
frequency  and/or  time.  However,  the  spatial  dimension  is  in 
general  used  in  a  very  rudimentary  fashion  by,  for  example, 
using  some  frequency  channels  in  certain  geographical  ar¬ 
eas  (frequency  planning)  to  limit  interference.  By  incorpo¬ 
rating  antenna  arrays  and  efficient  spatial-temporal  process¬ 
ing  techniques  into  future  systems,  both  the  capacity  and  the 
range  may  be  increased.  With  proper  processing,  it  is  possi¬ 
ble  to  multiplex  channels  in  the  spatial  dimension  just  as  in 
the  frequency  and  time  dimensions.  Spatially  selective  re¬ 
ception  and  transmission,  can  reduce  interference  in  the  sys¬ 
tem  significantly  allowing  frequencies  to  be  reused  more  of¬ 
ten  and  thereby  increasing  capacity. 

Each  user  has  a  unique  spatial-temporal  signature  as  seen 
by  the  base  station.  By  identifying  this  signature  for  the 
user-to-base  station  communication  link  (up  link),  the  sig¬ 
nal  of  interest  may  be  extracted  from  the  noise  while  sup¬ 
pressing  interference.  Furthermore,  with  knowledge  of  the 
spatial-temporal  signature  describing  the  base  station-to- 
user  (down  link)  channel,  transmission  schemes  may  be  de¬ 
vised  which  maximize  the  power  of  the  signal  of  interest  at 
the  user  while  minimizing  co  channel  interference  and  sup¬ 
pressing  overall  radiated  power.  This  offers  substantial  ca¬ 
pacity  increases  over  current  wireless  system  implementa¬ 
tions. 

In  [2,  13,  28,  26,  31,  32],  efficient  use  of  the  spatial  di¬ 
mension  by  employing  antenna  arrays  at  the  base  stations  of 
wireless  communication  systems  is  explored.  The  up  link 
problem  has  receive  substantial  attention  [2,  26,  19,  20,  24] 
whereas  the  down  link  problem  more  recently  has  drawn  in¬ 
terest  [13, 28, 32].  Of  course,  the  hardware  requirements  are 
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more  demanding  when  employing  antenna  arrays  with  mul¬ 
tiple  receivers  and  transmitters,  but  this  permits  a  sparser  in¬ 
frastructure  and  will  often  be  more  cost  effective.  In  general, 
increasing  the  range  of  cellular  systems  is  of  great  interest 
initially,  for  example,  when  deploying  the  new  PCS  system 
in  the  United  States.  However,  demand  for  increased  system 
capacity  is  expected  to  follow  shortly  after  adequate  cover¬ 
age  is  achieved  in  a  successful  system  installation. 

2.  Modeling  the  Communications  Channel 

When  attempting  to  exploit  the  spatial  dimension,  an 
additional  independent  parameter  must  be  used  when  dis¬ 
cussing  channel  models.  The  channel  impulse  response  is 
now  vector  valued  and  will  depend  on  the  spatial  distribu¬ 
tion  of  the  multipath  propagation  as  well  as  the  antenna  ar¬ 
ray  aperture  and  configuration.  When  designing  a  commu¬ 
nications  system,  the  channel  model  is  critical.  Given  a  de¬ 
scription  of  the  channel,  efficient  processing  schemes  may 
be  devised  and  system  performance  can  be  analyzed. 

There  are  low  rank  as  well  as  high  rank  channel  models 
and  these  concepts  have  impact  on  the  spatial  processing. 
The  rank  of  the  channel  is  also  coupled  to  the  concepts  of 
narrow  band  and  wide  band  signals  which  we  use  in  the  tem¬ 
poral  domain.  To  exemplify  this,  consider  a  white  noise  se¬ 
quence  (which  is  wide  band  of  course)  arriving  from  broad¬ 
side  at  a  linear  array.  This  is  a  spatial  channel  of  rank  one 
since  the  propagation  for  this  case  is  described  by  a  constant 
vector.  However,  as  soon  as  the  direction  of  the  signal  dif¬ 
fers  from  broadside,  the  channel  becomes  high  rank.  The 
propagation  of  a  perfect  narrow  band  signal  (sinusoid)  is  of 
course  always  described  by  a  low  rank  channel.  However,  as 
the  delay  spread  of  the  channel  increases  to  the  same  order 
as  the  symbol  time  of  a  narrow  band  communications  signal, 
the  rank  of  the  channel  increases. 

Another  critical  concept  when  discussing  spatial  process¬ 
ing  is  that  of  a  parameterized  array  manifold.  The  array 
manifold  is  the  collection  of  all  array  responses  to  a  single 
point  source  over  the  parameter  range  (for  example  location) 
of  interest.  This  is  only  a  useful  concept  if  the  number  of 
parameters  and  signals  is  small  in  relation  to  the  number  of 
antennas  (for  example  the  direction  to  the  source  for  a  fixed 
frequency)  and  the  array  response  can  be  measured  or  mod¬ 
eled  fairly  accurately  as  a  function  of  the  parameters  of  in¬ 
terest.  For  example,  near  field  scattering  or  mutual  coupling 
at  the  array  which  is  not  calibrated  or  has  a  nice  structure^  is 
very  difficult  to  model.  The  low  rank  signal  model  may  still 
be  quite  useful  even  when  it  is  not  possible  to  parameterize 
the  array  response.  In  these  cases,  the  response  of  the  array 
or  spatial -temporal  signature  characterizes  a  user.  By  not¬ 
ing  this  fact,  the  spatial  dimension  may  be  used  to  separate 
signals. 

'  An  equi-spaced  linear  array  with  identical  elements  (uniform  linear  ar¬ 
ray)  has  a  nice  structure. 


Below,  we  discuss  some  different  channel  models  and 
also  the  use  of  a  parameterized  model.  The  concept  of  an 
array  manifold  can  be  modified  to  incorporate  the  special 
propagation  environment  often  present  in  wireless  commu¬ 
nications.  First,  a  simple  low  rank  propagation  model  in¬ 
corporating  Rayleigh  fading  and  directional  information  is 
described.  This  model  is  valid  for  narrow  band  signals  and 
high  base  station  antenna  placement  with  little  near  field 
scattering.  Second,  a  high  rank  channel  model  is  described 
which  is  more  suited  for  large  time  delay  spreads  and  signif¬ 
icant  near  field  scattering  at  the  array. 

2.1,  A  Low  Rank  Channel  Model 

In  [23,  31]  a  model  of  the  flat  fading  due  to  local  scatter¬ 
ing  is  developed  taking  the  spatial  dimension  into  account. 
The  array  response  is  modeled  as  a  stochastic  vector  which 
has  a  parameterized  distribution.  These  parameters  provide 
a  useful  description  of  the  channel.  The  propagation  be¬ 
tween  the  mobile  and  the  array  is  modeled  as  a  superposition 
of  a  large  number  of  rays  originating  from  local  scatterers  in 
the  vicinity  of  the  mobile.  We  assume  independent  scatter¬ 
ing,  an  angular  distribution  of  the  scatterers  which  is  Gaus¬ 
sian  (as  seen  from  the  array),  and  that  the  relative  time  delays 
for  different  propagation  paths  are  small  compared  to  the  in¬ 
verse  of  the  bandwidth  of  the  communication  signal  (small 
delay  spread). 

Assuming  a  uniform  linear  array  with  element  spacing  A 
in  wavelengths,  the  signal  received  at  the  array  may  be  mod¬ 
eled  as 


x(t) 

=  vs(i)  +  n{t) 

(1) 

V 

e  NiO,Ii{9,a)) 

(2) 

R(0,(7) 

«  a{9)a*{e)&B{e,a) 

(3) 

a(0) 

_  gj27rAsine^  _  _  ^gj27rA(TO- 

-l)sinejr^4^ 

_  ^—2[nA(k—l)]^cr^  cos^  6 

(5) 

where,  x(^),  is  a  complex  valued  (m  x  1)  vector,  s{t)  is  the 
complex  envelop  of  the  transmitted  signal,  n{t)  is  the  addi¬ 
tive  noise,  O  denotes  element-wise  multiplication,  and  v  is 
the  channel  or  spatial  signature  which  is  a  complex,  Gaus¬ 
sian  random  vector  with  a  distribution  function  parameter¬ 
ized  by  the  nominal  direction  to  the  mobile,  6,  and  the  an¬ 
gular  spread  (standard  deviation),  a,  see  Figure  1. 

Equations  (1-5)  model  the  Rayleigh  fading  of  the  channel 
taking  the  spatial  dimension  into  account.  The  vector  a(0) 
is  often  termed  the  array  response  vector  and  represents  the 
array  output  to  a  point  source  from  direction  6.  The  angular 
spread,  cr,  is  a  critical  parameter  since  this  is  a  measure  of  the 
deviation  from  the  point  source  or  plane  wave  model.  Fre¬ 
quency  selective  fading  may  be  incorporated  in  this  model 
by  adding  time  delayed  versions  of  the  signal  with  different 
spatial  characteristics.  Also,  interfering  sources  on  the  same 
frequency  channel  may  easily  be  incorporated  to  the  model. 
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Figure  1.  Geometry  of  the  model  characteriz¬ 
ing  the  local  scattering  in  the  vicinity  of  the 
mobile. 


2.1.1  Propagation  Modeling  and  Data  Experiments 

The  spatial  channel  model  described  above  has  been  vali¬ 
dated  against  experimental  data  collected  by  Ericsson  Ra¬ 
dio  Systems.  In  the  field  experiments,  a  transmitter  has  been 
placed  in  urban  areas  with  non  line  of  sight  approximately 
1km  from  the  receiving  array  which  was  elevated  30  me¬ 
ters  above  the  ground  [4],  The  data  has  been  processed  to 
gain  insight  into  propagation  effects  as  well  as  into  the  be¬ 
havior  of  some  receiving  algorithms.  The  standard  devia¬ 
tion,  cr,  of  the  angular  distribution  is  a  critical  parameter  for 
SDMA  systems,  [32].  In  [23, 3 1]  the  angular  spread  is  found 
to  be  between  two  and  six  degrees  in  the  experiments  when 
the  transmitter  is  placed  1km  from  the  receiving  array.  In 
Figure  2,  the  estimated  directions  and  angular  spreads  along 
with  their  associated  standard  deviations  are  displayed  for  a 
number  of  trials  at  one  location. 

The  model  above  is  only  reasonable  for  small  angular 
spreads.  When  the  spread  is  large,  which  is  the  case  in  small 
cells  (short  range)  or  significant  near  field  scattering  at  the 
array  the  spatial  signature  can  not  be  parameterized  by  the 
direction. 


Figure  2.  Estimated  directions  and  angular 
spreads  in  degrees  with  standard  deviations 
versus  trial  number. 


signal  separation  or  channel  estimation  techniques.  A  pa¬ 
per  by  Tong  et.al.  [22]  in  1991  sparked  a  great  interest 
in  the  research  community  for  blind  channel  estimation 
based  on  oversampled  digital  communications  signals.  A 
synchronously  symbol  sampled  signal  provides  a  sufficient 
statistic  for  detection,  however,  it  does  not  allow  the  unique 
identification  of  the  channel  from  second  order  statistics. 
Synchronization  requires  timing  recovery  and  this  is  often 
achieved  through  oversampling  in  relation  to  the  symbol 
rate  of  the  signal.  The  oversampling  may  be  achieved  ei¬ 
ther  in  space  or  time  and  results  in  a  cyclo-station  ary  pro¬ 
cess  when  viewed  as  a  scalar  process.  However,  if  cast  in 
an  appropriate  vector  measurement  model,  the  vector  valued 
signal  is  stationary.  Under  certain  identifiability  conditions 
[16, 18],  the  channel  may  be  consistently  estimated  from  the 
second  order  statistics  of  the  vector  valued  process. 

Below,  we  will  first  view  the  oversampling  as  spatial, 
thereafter,  oversampling  in  time  will  be  introduced  as  well. 
By  casting  this  model  in  an  appropriate  spatial-temporal 
vector  form,  the  low  rank  nature  of  the  signals  is  apparent. 

Assume  that  a  signal  s(t)  is  transmitted  from  a  user,  then 
the  m  element  array  output,  x(^),  is  given  by 


2.2.  A  High  Rank  Channel  Model 

Below,  a  model  is  developed  which  is  appropriate  when 
there  is  a  large  delay  spread  among  the  multipaths  and  pa¬ 
rameterization  in  terms  of  direction  is  not  possible.  This 
model  is  to  some  degree  common  to  the  so-called  blind 


x(t)  = 


Xi(t) 

Xi{t) 


=  h*  s{t)  +  n(<)  =  hsi,{t)  +  n{t)  , 


(6) 
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where  n{t)  is  the  noise  and 


where 


s(t) 

s(t  -  1) 

s(t-(L-l)) 
h  =  [h(l),h(2),...,h(L)].  (8) 

The  impulse  response  of  the  channel,  antenna  elements,  re¬ 
ceiver  and  transmitter  filters  is  modeled  by  h  and  will  be 
termed  simply  the  channel  response.  It  will  be  modeled  as 
finite  with  channel  length  L.  If  the  channel  is  low  rank,  it 
may  be  modeled  by  a  single  complex  vector  (L  =  1).  How¬ 
ever,  if  the  propagation  time  of  s(f)  across  the  aperture  of  the 
array  is  on  the  same  order  as  the  symbol  time  of  the  signal, 
a  higher  rank  channel  model  must  be  used. 

Remark 

Note  that  in  cases  where  a  small  numberof  dominant  multi- 
path  signals  with  large  time-delays  are  present,  the  low  rank 
channel  model  is  still  useful.  With  appropriate  spatial  pro¬ 
cessing,  the  individual  multipaths  may  be  separated.  Sev¬ 
eral  options  are  available  for  combining  the  signal  in  the 
temporal  domain  [10],  pre-detection,  post-detection,  soft- 
combining  etc.. 

A  high  rank  channel  model  is  expected  to  be  appropriate 
in  environments  with  severe  multipath  and  delay  spread,  for 
example,  wireless  local  area  networks  with  very  high  data 
rates. 

2.3.  Joint  Spatial-Temporal  Model 

If  L  <  m  the  signal  of  interest  is  not  full  rank  and  sub¬ 
space  methods  may  be  used  to  estimate  the  column  space 
spanned  by  the  channel  h  and  determine  the  row  space 
spanned  by  signal  provided  that  h  is  rank  L.  In  or¬ 

der  to  determine  the  actual  signal,  the  finite  alphabet  prop¬ 
erty  must  be  exploited,  for  example  as  in  [21].  If  L  >  m, 
the  signal  is  no  longer  low  rank  in  the  spatial  domain  and 
joint  spatial-temporal  processing  is  required.  This  can  be 
achieved  by  either  oversampling  or  forming  a  sliding  win¬ 
dow  or  both. 

Consider  oversampling  the  received  signal  by  a  factor  P 
and  let  the  symbol  time  be  1.  We  have 

x{f.  +  =  h‘sL(t)  +  n*(t)  ,  i  =  1, . . . , P  .  (9) 

The  vector  of  all  the  oversampled  antenna  outputs  is  given 
by 

r  x(f) 

x(f  +  p) 

X%t.)=  .  =HsLit)  +  n%t)  ,  (10) 


’  1  r  n(f) 

h2  n(f  +  ^) 

H=  n‘’(t)=  .  (II) 

_  _  n(f  + 

The  channel  matrix,  H,  is  {mP  x  L)  and  thus,  if  mP  <  L 
the  signal  of  interest  will  be  confined  to  a  low  rank  subspace 
of  the  joint  mP-dimensional  spatial  and  temporal  measure¬ 
ment  space. 

Now,  introduce  a  sliding  window  and  form  the  vector 


x^{t) 

x^{t  -  1) 

X(^)  =  ,  =  +  N(t) , 

(12) 

where 


n^{t  -  (M  -  1)) 

(13) 

The  channel  matrix,  ?i,  is  {mPM  x  (L  -I-  M  —  1))  and  has 
special  structure  due  to  the  sliding  window.  In  [6,  12,  17], 
it  is  noted  that  the  channel  matrix  is  linearly  parameterized 
with  respect  to  the  channel  coefficients  lending  itself  to  a  two 
step  subspace  fitting  approach  to  estimate  the  channel.  Un¬ 
der  appropriate  identifiability  conditions,  [  1 6,  1 8],  the  chan¬ 
nel  may  be  estimated  up  to  a  scaling  and  this  in  turn  may  be 
used  to  estimate  the  signals. 

The  model  above  is  easily  extended  to  multiple  signals 
by  letting  s{t)  be  a  d- vector  with  the  complex  amplitudes  of 
the  d  signals  and  h{k)  is  a  (m  x  d)  matrix  of  channel  coeffi¬ 
cients.  Since  the  signal  can  be  viewed  as  a  low  rank  process 
in  this  measurement  space,  subspace  based  methods  may  be 
applied  to  separate  the  signals  from  the  noise.  Furthermore, 
since  the  span  of  the  channel  matrix  may  be  identified,  the 
influence  of  the  channel  may  be  remove  from  the  signals. 
In  the  presence  of  multiple  signals,  temporal  characteristics 
are  require  to  estimate  the  individual  signal  sequences.  Sec¬ 
tion  3.1.2. 


3.  Exploiting  the  Spatial  Dimension 

To  achieve  increased  range  in  a  wireless  communication 
system,  it  may  be  argued  that  the  mobile  to  base  commu¬ 
nication  (up  link)  is  the  critical  link.  It  is  desirable  that  the 
mobiles  operate  at  low  powers  and  thus,  for  acquisition,  the 
base  stations  must  be  able  to  detect  weak  signals  of  short  du¬ 
ration  in  a  noisy  and  possibly  interfering  environment.  In 
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the  down  link  (base  to  mobile  communication),  increased 
range  may  be  achieved  by  for  example  increasing  the  trans¬ 
mit  power. 

To  achieve  increased  system  capacity  by  employing  an 
array  of  antennas  at  the  base  stations,  the  frequency  reuse 
distance  may  be  decreased  [4,  15]  or  the  frequency  channels 
may  be  reused  with  in  a  cell  [32]  (or  a  combination  thereof). 
In  both  cases,  the  interference  in  the  system  induced  by  other 
users  is  of  course  increased.  In  the  up  link,  this  is  mani¬ 
fested  by  the  cross-talk  problem.  Mobiles  operating  on  the 
same  channel  (frequency/time  slot)  with  dramatically  differ¬ 
ent  signal  amplitudes  caused  by,  for  example,  fading  are  dif¬ 
ficult  to  separate.  It  is  difficult  to  adequately  suppress  the 
stronger  signal  when  estimating  the  weaker  signal  resulting 
in  cross-talk.  In  some  sense  the  down  link  problem  may  be 
even  more  severe,  especially  in  frequency  division  duplex 
(FDD)  systems  [32].  The  fading  caused  by  local  scattering 
around  the  mobile  (or  the  base  station)  is  observable  in  the 
up  link  but  unobservable  in  the  down  link  due  to  the  uncor- 
relatedness  of  the  fading  processes  at  the  different  frequen¬ 
cies.  The  up  and  down  link  channels  are  not  reciprocal.  The 
down  link  problem  has  received  limited  attention.  In  [15]  a 
method  is  proposed  which  does  not  exploit  directional  infor¬ 
mation  whereas  in  [32]  a  model  based  approach  using  this 
information  is  proposed. 

3.1.  Up  Link  Processing 

When  receiving  communication  signals  at  an  antenna  ar¬ 
ray,  the  proposed  signal  processing  methods  for  distinguish¬ 
ing  different  messages,  can  be  grouped  in  two  main  cat¬ 
egories;  those  that  exploit  array  response  information  and 
those  that  do  not.  Assuming  the  low  rank  channel  model 
with  small  spread  angle  described  above,  it  is  possible  to  use 
direction  estimation  techniques  which  use  array  response  in¬ 
formation  to  separate  signals.  These  methods  will  be  re¬ 
ferred  to  as  using  directional  information  and  include  tech¬ 
niques  proposed  in  e.g.  [32,  2,  19]. 

The  other  class  of  methods,  makes  few  or  no  assumptions 
on  the  array  response  but  rely  on  other  properties  for  sepa¬ 
rating  the  signals. 

3.1.1  Directional  Information 

Due  to  the  local  scattering,  spatial  signature  represented  by 
V  does  not  belong  to  the  array  manifold,  i.e., 

V  7^  a(^),  for  any  Q  .  (14) 

This  may  also  be  interpreted  as  the  wavefront  at  the  array 
not  being  planar.  This  may  be  interpreted  as  spatial  diver¬ 
sity,  i.e.,  the  correlation  between  antenna  elements  decreases 
with  distance,  this  is  seen  in  the  structure  of  the  second  mo¬ 
ment  of  V  in  (3).  The  fiat  fading  becomes  less  severe  at  the 
array  as  the  diversity  increases,  i.e.,  cr  increases.  Techniques 


that  make  no  use  of  directional  information,  e.g.,  [26]  ef¬ 
ficiently  exploit  this  fact  and  perform  better  as  the  angular 
spread  increases.  Methods  that  are  based  on  directional  in¬ 
formation,  a(0),  for  estimating  the  signals  [2,  14]  will  in 
general  deteriorate  as  the  angular  spread  becomes  larger. 
These  methods  which  are  related  to  traditional  beamform¬ 
ing  techniques,  are  derived  from  a  point  source  model.  This 
behavior  is  not  surprising  since  v  will  not  correspond  to  an 
array  response  vector  for  any  6. 

In  [11],  the  directional  error  caused  by  local  scattering 
is  analyzed  and  characterized  for  different  estimators  which 
make  use  of  array  manifold  information.  The  error  is  in  gen¬ 
eral  small  and  if  the  goal  is  to  increase  the  range  of  a  cellu¬ 
lar  system,  this  model  error  is  not  critical,  [29].  However, 
the  situation  is  quite  different  when  attempting  to  host  mul¬ 
tiple  mobiles  on  the  same  frequency  channel.  Even  a  small 
directional  error  can  cause  a  significant  degradation  in  the 
estimates  of  the  signals.  Since  the  array  manifold  vector  in 
the  nominal  direction,  0,  differs  from  the  spatial  signature, 
an  error  will  be  made  when  determining  the  copy  vectors 
using  the  point  source  model.  Consider  the  case  when  two 
signals  are  present  and  data  is  collected  during  a  short  pe¬ 
riod  in  time  so  that  the  users  may  be  considered  stationary. 
In  a  fading  environment,  the  signal  strengths  of  the  two  sig¬ 
nals  can  be  quite  different.  Thus,  a  small  error  in  suppress¬ 
ing  the  stronger  signal  will  cause  a  significant  decrease  in 
signal  to  interference  and  noise  ratio  (SINK)  of  the  weaker 
signal.  One  way  of  improving  the  performance  in  these  sit¬ 
uations  is  to  modify  the  array  manifold  model.  The  estimate 
of  the  signal  subspace  may  be  quite  accurate  and  this  infor¬ 
mation  can  be  used  to  obtain  an  improved  estimate  of  the 
spatial  signatures. 

We  will  provide  a  simple  modification  to  the  point  source 
model  which  yields  improved  estimates  of  the  signal  wave¬ 
forms.  For  small  angular  spreads,  the  spatial  signature  can 
be  approximated  as  a  linear  combination  of  the  array  mani¬ 
fold  vector  and  its  derivative 

V  ~  aiei,pi)  =  a(0i)  +  pidiOi)  ,  di9i)  = 

(15) 

This  may  be  viewed  as  a  generalized  array  manifold, 
a.{6i,  Pi),  parameterized  by  Oi  and  pi.  Assume  that  estimates 
of  6  and  of  the  signal  subspace.  Eg,  have  been  obtained 
from  the  data.  We  may  pose  the  problem  of  finding  the  p 
that  provides  the  best  fit  between  the  signal  subspace  and 
the  generalized  manifold.  This  is  a  subspace  fitting  prob¬ 
lem  where  p  is  a  linear  parameter  in  the  manifold  and  can 
thus  be  solved  for  in  a  least  squares  sense.  Let  A(0,  p)  = 
[a(^i ,  pi) . . .  a(^d,  pd)]  where  d  is  the  number  signals.  The 
following  minimization  problem 

p  =  argmmTr{A*(0,p)(I  -  E,E;)A(0,/9)}  (16) 

can  be  solved  explicitly  to  provide  and  estimate  of  p. 
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Consider  the  following  example  where  two  signals  50  dB 
and  30  dB  (on  average)  above  the  spatially  and  temporally 
white  noise  are  present.  An  8  element  uniform  linear  ar¬ 
ray  is  used,  the  nominal  directions  to  the  sources  are  0°  and 
20°  and  assumed  known,  in  each  trial  100  snapshots  are  col¬ 
lected,  and  the  signals  are  copied  (estimated)  using  the  so- 
called  deterministic  weight  vectors, 

s(^)  =  (A*A)“^A*x(t)  .  (17) 

In  Figure  3,  the  average  SINK  over  500  independent  noise 
and  channel  realizations^  is  displayed  for  the  weaker  sig¬ 
nal  as  a  function  of  the  angular  spread  in  degrees.  For  this 
case,  the  improvement  obtained  by  using  the  generalized  ar¬ 
ray  manifold  can  be  as  much  as  10  dB  on  average,  much 
larger  improvements  may  be  obtained  for  certain  channel  re¬ 
alizations.  The  estimator  above  is  quite  straight  forward,  it 
is  certainly  possible  to  jointly  estimate  6  and  p.  Alternative 
generalizations  of  the  array  manifold  are  also  possible. 


Figure  3.  Signal  to  interference  and  noise  ra¬ 
tio  for  the  weakest  signal  as  a  function  of  an¬ 
gular  spread. 


The  method  described  above  makes  no  use  of  available  a 
priori  information  on  the  source  signals,  for  example,  train¬ 
ing  and  preamble  sequences  in  digital  communication  sys¬ 
tems  are  often  present.  In  [7,  25],  an  approach  is  formu¬ 
lated  which  exploits  both  temporal  and  spatial  information. 
In  [3],  it  is  shown  how  this  method  may  be  integrated  with 
the  Viterbi  algorithm  to  perform  accurate  symbol  detection 
in  mobile  communications  using  the  GSM  standard. 

3.1.2  Non  Directional  Information 

When  attempting  to  separate  multiple  signals  or  suppress  in¬ 
terference  without  making  use  of  directional  information, 

-The  channel  realizations  are  normalized  to  have  norm  Vs. 


some  temporal  characteristics  of  the  signal  must  be  used. 
In  [2, 26],  a  reference  signal  is  assumed  available  which  may 
be  correlated  with  the  array  output  to  achieve  signal  sepa¬ 
ration.  This  reference  signal  may  be  a  known  training  se¬ 
quence,  a  known  code  sequence  [13],  or  may  be  generated 
by  feeding  back  decisions  [8].  There  are  a  number  of  meth¬ 
ods  that  make  use  of  the  constant  modulus  property  or  finite 
alphabet  of  communication  signals  to  separate  them  [1,21]. 
In  general,  these  methods  are  concerned  with  the  low  rank 
channel  model. 

As  seen  in  the  previous  section,  also  in  the  case  of  a  high 
rank  channel,  the  signal  is  low  rank  for  an  appropriate  mea¬ 
surement  model.  In  [12, 18]  the  channel  estimation  problem 
for  one  signal  in  noise  is  cast  in  a  subspace  framework.  Us¬ 
ing  subspace  based  methods,  estimation  algorithms  are  pro¬ 
posed  and  evaluated.  In  [6]  the  subspace  based  methods  are 
analyzed  and  performance  bounds  are  derived.  Detecting 
the  transmitted  symbol  sequence  in  the  presence  of  multi¬ 
ple  users  is  described  in  [24].  The  row  space  spanned  by  the 
signals  is  first  identified,  this  removes  the  effect  of  the  chan¬ 
nel  (inter  symbol  interference).  The  individual  transmitted 
signals  may  be  separated  by  exploiting  the  finite  alphabet 
property.  In  [21]  the  detection  is  achieved  by  alternatively 
making  symbol  decisions  (from  an  estimated  channel)  and 
then  estimating  the  channel  (based  on  a  known  symbol  se¬ 
quence).  A  similar  concept  is  described  in  [20]  where  initial 
symbol  decisions  are  used  to  reconstruct  a  reference  signal 
which  in  turn  is  used  to  estimate  the  channel  and  so  on. 

The  problem  of  blind  signal  separation  in  this  application 
is  difficult  although  it  may  be  relevant  when  there  is  large 
uncertainty  surrounding  the  transmitted  signal.  In  practical 
digital  communication  systems,  known  bit  sequences  are  al¬ 
ways  present  in  some  form  to  identify  users  and  establish 
a  communication  link.  Training  sequences  may  be  used  to 
obtain  an  initial  estimate  of  the  channel  and  then  a  tracking 
mode  takes  over,  updating  the  channel  estimate  based  on  the 
demodulated  and  possibly  remodulated  signal. 

3.2.  Down  Link  Processing 

Consider  the  simple  low  rank  channel  model  in  (1)  which 
may  be  used  to  characterize  the  down  link  spatial  channel 
statistics  as  well.  However,  in  most  current  frequency  divi¬ 
sion  duplex  (FDD)  systems  the  up  and  down  link  flat  fad¬ 
ing  may  be  considered  independent.  If  the  main  objective  is 
increased  range,  this  does  not  pose  a  major  problem.  How¬ 
ever,  the  unobservable  down  link  channel  is  one  of  the  main 
obstacles  if  the  intention  is  to  also  increase  system  capacity. 
An  array  could  be  employed  at  the  mobile  site  as  well,  but 
in  many  applications  this  is  not  considered  a  feasible  solu¬ 
tion.  Another  alternative  is  to  attempt  to  estimate  the  chan¬ 
nel  by  employing  feedback  [5].  This  requires  a  complete  re¬ 
design  of  protocols  and  signaling  and  is  probably  only  pos¬ 
sible  in  environments  which  vary  very  slowly  in  time.  This 
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technique  may  be  feasible  for  movable  (rather  than  mobile) 
systems  such  as  indoor  wireless  local  area  networks. 

If  we  are  attempting  to  increase  capacity  in  current  FDD 
systems  in  the  down  link,  the  information  gained  from  the 
signal  separation  techniques  in  the  up  link,  can  not  be  used 
directly.  Since  the  channels  are  not  reciprocal,  reusing  an 
optimal  weight  vector  obtained  from  receive  data  in  the 
transmit  mode  is  not  advisable.  One  should  at  least  attempt 
to  transform  the  weights  to  the  transmit  frequency.  How¬ 
ever,  this  is  not  a  well  conditioned  problem  unless  an  array 
response  model  is  introduced.  When  using  an  array  model 
to  transtonn  weight  vectors,  directional  information  is  ex¬ 
ploited.  In  [27],  the  spatial  signatures  of  the  users  are  first  es¬ 
timated  using  temporal  information,  then  the  directions  may 
be  extracted  by  applying  a  parameterized  array  manifold.  It 
should  be  noted  that  in  [15],  a  transmit  scheme  is  proposed 
which  does  not  use  directional  information.  The  down  link 
scheme  is  based  on  statistical  information  estimated  in  the 
up  link  to  take  into  account  the  unobservable  fading.  How¬ 
ever,  the  frequency  duplex  distance  is  not  compensated  for 
causing  the  system  to  degrade  in  the  presence  of  line  of  sight 
propagation. 

In  time  division  duplex  (TDD)  systems,  the  up  and  down 
link  channels  can  be  considered  reciprocal  if  there  is  limited 
movement  between  receive  and  transmit.  Up  link  channel 
information  may  then  be  used  to  achieve  spatially  selective 
transmission  and  thus  increasing  capacity  [10].  When  the 
channel  is  high  rank,  combined  spatial -temporal  process¬ 
ing  may  be  applied  on  the  down  link  to  increase  capacity. 
The  estimated  up  link  channel  may  be  inverted  in  the  sense 
that  the  signals  are  appropriately  pre-equalized  and  spatially 
multiplexed  at  the  base  station  to  minimize  inter  symbol  and 
co-channel  interference  at  the  users,  [9]. 

The  efficient  use  of  the  spatial  dimension  in  current  FDD 
cellular  systems  with  high  mobility  requires  the  use  of  direc¬ 
tional  information.  Array  response  modeling  is  feasible  for 
medium  to  large  size  cells  with  high  placement  of  the  base 
station  antennas  avoiding  near  field  scattering. 

4  Down  Link  Capacity 

As  argued  previously,  the  down  link  is  likely  to  be  the 
limiting  factor  when  increasing  the  capacity  of  cellular  sys¬ 
tems.  This  is  mainly  due  to  the  independence  between  the  up 
and  down  link  channels  when  FDD  is  employed.  There  are 
two  main  approaches  for  increasing  capacity  with  antenna 
arrays.  The  frequency  reuse  distance  may  be  decreased  or 
multiple  mobiles  may  be  allocated  to  the  same  cell  (or  some 
combination  of  the  above).  In  [32],  the  down  link  capac¬ 
ity  problem  is  studied  for  FDD  systems  and  a  transmission 
scheme  is  proposed  based  on  channel  information  estimated 
on  the  up  link.  It  is  shown  that  when  inter  cell  nulling  is  not 
employed,  multiple  mobiles  per  cell  is  in  general  a  more  ef¬ 
ficient  way  of  increasing  capacity.  Also,  capacity  depends 


Figure  4.  Possible  configuration  of  an  SDMA 
system. 


to  a  large  degree  on  the  spread  angle  of  the  mobiles.  In 
[30],  a  down  link  system  using  inter  cell  nulling  and  slow 
up  link  power  control  is  studied.  In  this  case,  reduced  clus¬ 
ter  size  provides  the  largest  capacity  increases.  Note  how¬ 
ever  that  this  requires  power  control  with  good  dynamic 
range  and  direction  estimation  to  users  in  neighboring  cells. 
Also,  inter  cell  nulling  may  be  quite  difficult  even  in  a  syn¬ 
chronous  TDMA  system  when  propagation  delays  are  sig¬ 
nificant.  However,  reducing  the  frequency  reuse  distance  in 
conjunction  with  frequency  hoping  and  dynamic  channel  al¬ 
location  could  reduce  the  requirements  on  frequency  plan¬ 
ning. 

5  Summary 

Providing  adequate  coverage  and  sufficient  capacity  are 
two  challenging  problems  for  wireless  communication  sys¬ 
tems.  Antenna  arrays  at  the  base  stations  of  cellular  sys¬ 
tems  can  increase  range  compared  to  current  systems.  The 
capacity  problem  can  be  significantly  mitigated  by  spatial 
division  multiple  access  (SDMA)  techniques.  SDMA  sup¬ 
ports  multiple  connections  on  a  single  conventional  channel, 
based  on  spatial  reception  and  transmission  schemes  and/or 
decreased  frequency  reuse  distance  by  reducing  and  reject¬ 
ing  interference.  Thus,  capacity  may  be  increased  over  cur¬ 
rent  wireless  system  implementations. 
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Abstract 

The  characteristics  of  the  mobile  radio  channel  vary 
between  the  features  of  an  AWGN-channel  and  those  of  a 
Rayleigh  fading  channel 

The  so  far  known  trellis  codes  do  not  meet  the  re¬ 
quirements  of  both  these  channels,  because  they  are  either 
adapted  to  the  AWGN  or  the  Rayleigh  fading  channel 
In  this  paper^  we  present  new  multiple  trellis  codes,  which 
are  well  suited  for  the  use  in  the  AWGN-channel  and  the 
Rayleigh  channel  and  are  thus  especially  suited  for  the  mo¬ 
bile  radio  channel 

A  new  measure  of  complexity  is  introduced  which  allows 
a  fair  comparison  between  multiple  trellis  codes  of  different 
dimensionalities.  It  is  based  on  the  number  of  algebraic 
operations  per  decoded  information  bit 


1.  Introduction 

Since  their  invention  in  the  early  1980ies  [1]  trellis  codes 
have  been  used  exhaustively  to  gain  noise  immunity  in  com¬ 
munication  systems.  Trellis  codes  can  be  applied  in  systems 
with  high  information  rates  and  Ri,  >  1  information  bit  per 
channel  symbol,  whereas  the  classical  binary  block  and  con¬ 
volutional  codes  are  constrained  to  rates  <  \. 

The  basic  idea  of  trellis  codes  is  simple:  instead  of  trans¬ 
mitting  redundant  bits  as  in  the  case  of  block  and  convo¬ 
lutional  codes  the  size  of  the  transmission  alphabet  M  is 
increased  such  that  M  >  i.e.  there  are  more  symbols 
for  transmission  as  actually  needed. 

The  redundancy  of  the  increased  symbol  alphabet  is  ex¬ 
ploited  in  a  way  that  at  each  time  slot  tk  only  a  subset  of 
all  possible  symbols  is  allowed  for  transmission.  The  valid 
symbol  sequences  are  created  by  the  actual  information  bits 

'  This  work  was  sponsored  by  the  Austrian  Science  Foundation  (FWF), 
grant  10294  OPY. 


bk^u  at  time  4  and  the  contents  of  a  finite  state  machine.  The 
state  at  time  slot  is  defined  by  the  values  of  the  previous 
information  bits  ,  bk-n,u  at  times  4- 1  r  *  ‘ ,  4-n. 

The  remaining  task  is  to  define  some  criterion  how  to  select 
the  allowed  symbol  sequences  and  to  find  appropriate  finite 
state  machines. 

For  the  AWGN-channel  the  problem  was  solved  by 
G.  Ungerbock  [1]  -  [3],  who  actually  invented  trellis  coded 
modulation  (TCM).  In  [1]  -  [3]  MPSK-  and  MQAM- 
alphabets  are  used.  The  optimization  criterion  for  trellis 
codes  in  the  AWGN-channel  is  the  squared  Euclidean  dis¬ 
tance  between  allowed  coded  symbol  sequences 

since  the  error  performance  at  high  values  of  the  SNR  is 
lower  bounded  by: 

<» 

where  Aro/2  is  the  two-sided  spectral  density  of  the  additive 
white  Gaussian  noise  and  Qf)  the  complementary  Gaussian 
distribution  function. 

In  [4]  the  idea  of  trellis  codes  for  the  AWGN-channel  is 
extended  to  multiple  symbols,  i.e.  ^-tuples  of  MPSK-  or 
MQAM-symbols  are  assigned  to  the  trellis  branches. 

For  the  Rayleigh  fading  channel  the  TCM-problem  was 
solved  by  D.  Divsalar  et  al.  [5], [6].  It  turned  out  that  the 
minimum  number  of  distinct  symbols  (Le/f)  between  any 
two  coded  symbol  sequences  has  to  be  maximized  for  min¬ 
imizing  the  error  rate: 


/?2  .  ’ 

The  effective  length  (I/g//)  dominates  the  slope  of  the  bit 
error  rate  curve  as  a  function  of  SNR.  The  parameter  0^  is 
the  so  called  product  distance,  i.e.  the  product  of  all  non-zero 
Euclidean  branch  distances  along  any  two  trellis  paths.  It  has 
to  be  maximized  as  well.  Conventional  trellis  codes  with  one 
symbol  per  trellis  branch  and  trellis  codes  with  more  than 
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one  symbol  per  trellis  branch  (multiple  trellis  codes),  which 
are  especially  designed  for  the  Rayleigh  fading  channel,  can 
be  found  in  [5]  -  [10]. 

It  can  be  seen  from  eqn.(2)  that  on  one  hand  the  minimum 
squared  (free)  Euclidean  distance  is  of  no  primary  im¬ 
portance  for  the  error  performance  in  the  Rayleigh  fading 
channel  and  on  the  other  hand  the  effective  length  and  prod¬ 
uct  distance  do  not  show  up  in  eqn.(l)  for  the  BER  of  the 
AWGN-channel.  Hence  the  optimum  trellis  codes  for  the 
two  channels  are  obtained  by  different  optimization  criteri- 
ons.  As  a  consequence  trellis  codes  for  the  AWGN-channel 
do  not  perform  well  in  the  Rayleigh  fading  channel  and  vice 
versa. 

The  mobile  radio  channel  -  as  met  in  wireless  commu¬ 
nications  -  has  been  of  growing  economic  interest  in  the 
recent  years.  Any  communication  system  has  to  cope  with 
short  and  long  term  time  variant  channel  characteristics. 
The  short  term  fluctuations  caused  by  multipath  propaga¬ 
tion  result  in  deep  fades  of  the  signal  power.  These  deep 
fades  can  be  combated  against  with  equalization,  interleav¬ 
ing,  and  coding  techniques.  However,  there  are  also  long 
term  fluctuations  in  the  characteristics  of  the  mobile  radio 
channel  depending  on  the  actual  position  of  the  base  and  the 
mobile  station:  in  urban  areas  the  signal  is  affected  by  mul¬ 
tipath  propagation  and  thus  the  fading  channel  is  Rayleigh 
like.  In  rural  areas  there  are  only  a  few  scatterers,  hence  the 
channel  is  similar  to  an  AWGN-channel. 

Therefore  it  is  clear  that  none  of  the  so  far  known  trellis  codes 
are  suited  for  the  mobile  radio  channel  which  requires  trellis 
codes  optimized  for  the  AWGN-channel  and  the  Rayleigh 
fading  channel  simultaneously. 

The  rest  of  the  paper  is  organized  as  follows:  In  section  2 
we  will  present  new  construction  principles  for  trellis  codes 
which  are  well  suited  for  the  use  in  the  AWGN-channel  and 
the  Rayleigh  fading  channel.  Finally,  in  section  3  a  new 
definition  of  trellis  complexity  will  be  given. 

2.  Code  construction 

Any  code  that  performs  well  in  the  AWGN-channel  and 
the  Rayleigh  fading  channel  will  also  perform  well  in  the 
mobile  radio  channel  which  fluctuates  between  AWGN  and 
Rayleigh  characteristics.  The  optimization  criterion  for 
our  new  trellis  codes  is  to  maximize  the  three  parameters 
simultaneously.  Our  results  are  based  on 
three  new  supports: 

•  The  construction  rule  for  the  MPSK-subset  decom¬ 
position  which  give  optimum  trellis  codes  for  fading 
channels  presented  in  [6]  have  been  rearranged.  The 
new  rules  can  be  interpreted  as  a  non  linear  labeling 
method  of  the  MPSK-signals,  in  contrary  to  the  linear 
labeling  method  presented  in  [6]. 


The  new  subset  decomposition  results  in  an  increased 
free  distance  and  an  increased  product  distance 
compared  to  the  so  far  known  codes. 

•  A  completely  new  mapping  of  multiple  MPSK- 

symbols  to  trellis  branches  increases  the  effective 
length  Leff  and  the  free  distance  The  idea 

behind  the  new  method  is  to  optimize  the  distances  be¬ 
tween  emerging  and  the  distances  between  reemerg¬ 
ing  branches  of  all  trellis  states  (in  the  so  far  known 
codes  only  the  distances  between  emerging  branches 
are  optimized). 

•  A  new  bit-to-symbol  mapping  helps  to  minimize  the 
number  of  bit  errors  per  error  event.  Binary  infor¬ 
mation  sequences  with  a  small  Hamming  distance 
correspond  to  coded  symbol  sequences  with  a  small 
Euclidean  distance  and  a  short  effective  length. 

•  Sometimes  this  three  supports  are  supplemented  with 
a  fourth  action:  instead  of  doubling  the  symbol  al¬ 
phabet  needed  for  uncoded  transmission  as  applied 
in  all  so  far  known  systems  (cf.  [1]),  a  fourfold  larger 
alphabet  is  used  which  allows  to  increase  the  effective 
length  and  minimizes  the  number  of  nearest  neighbor 
sequences. 

The  multiple  MPSK  trellis  codes  constructed  according  to 
these  specified  rules  outperform  the  so  far  known  codes.  In 
the  AWGN-channel  the  average  gain  of  SNR  is  approx.  1  dB 
at  Pk  =  10"^,  in  the  Rayleigh  fading  channel  it  is  approx.  3 
dB.  In  fig.(l)  the  simulation  results  of  two  trellis  codes  are 
shown.  One  has  been  constructed  according  to  our  new 
rules  and  the  other  one  according  to  the  rules  given  in  [6]. 
In  both  cases  4D-QPSK  symbols  (pairs  of  QPSK-symbols) 
are  used  for  transmission  at  rate  Rt  —  I  information  bit  per 
QPSK-symbol.  The  number  of  trellis  states  equals  5  4. 

The  code  parameters  are:  Leff  =  4,  A^^^^  =  10,  =  32 

for  the  new  code  and  Leff  ~  3,  A^^^^  6,  /3^  —  8  for  the 

other  code.  It  can  be  seen  in  fig.(l)  that  our  new  code  clearly 
outperforms  the  other  code:  at  Pb  ~  10“^  the  coding  gain 
equals  2.5  dB  in  the  AWGN-channel  and  more  than  4  dB  in 
the  Rayleigh  fading  channel. 

A  more  detailed  description  of  our  new  construction  rules 
and  a  complete  list  of  all  new  codes  can  be  found  in  [11]. 

3.  Code  complexity 

Complexity  is  an  important  aspect  of  trellis  codes.  All 
so  far  given  definitions  of  TCM-complexity  consider  the 
connectivity  of  the  trellis  graph  regardless  whether  the  con¬ 
nections  contain  parallel  transitions  or  not.  Following  the 
notation  of  G.  Ungerbock  [3]  this  complexity  can  be  written 
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Figure  1.  Simulation  results  of  two  4D-QPSK 
trellis  codes  with  four  states  in  the  AWGN- 
channel  and  the  Rayleigh  fading  channel. 


as: 


where  5  is  the  number  of  trellis  states,  n  the  number  of  coded 
bits  per  transition,  and  k  the  number  of  channel  symbols  per 
transition. 

In  contrast  to  this  definition  we  propose  a  more  prac¬ 
tical  definition  of  TCM-complexity  which  emphasizes  the 
computational  effort  of  the  decoding  process.  The  newly 
defined  trellis  complexity  Kb  measures  the  number  of  al¬ 
gebraic  operations  per  decoded  information  bit.  It  can  be 
written  as: 


•  M  4-  5  •  2^^^’  {k  +  \  /x) 

= - m, - ' 

where  M  is  the  seize  of  the  symbol  alphabet,  and  x  the 
number  of  parallel  branches  per  transition.  The  derivation 
of  eqn.(4)  can  be  found  in  [1 1].  It  is  also  shown  in  [11]  that 
the  complexity  /C,  eqn.(3),  favors  trellis  codes  with  large  k 
in  an  unfair  way.  This  is  overcome  by  the  new  definition  of 
Kb  which  shows  that  the  computational  effort  of  the  decod¬ 
ing  process  grows  exponentially  with  fc. 

However,  this  drawback  of  trellis  codes  with  high  multi¬ 
plicity  is  all  alleviated  by  the  fact  that  these  codes  are  well 


suited  for  efficient  parallel  implementation  in  real  time  sys¬ 
tems  [12]. 
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Abstract:  A  formula  for  the  power  spectral  density 
of  maximum  entropy  M-ary  (d,k)  constrained 
sequences  is  given.  The  formula  for  spectrum  is 
derived  by  using  the  method  of  a  difference 
equation  assigned  to  a  Markov  chain  generating  M- 
ary  (d,k)  sequences. 

Introduction 

Multi-amplitude  (d,k)  codes  are  introduced 
by  French  and  Wolf  [4],  Barman  [3],  and 
McLaughlin  [7].  One  of  the  applications  of  these 
codes  are  electron  trapping  optical  recording 
channels  [7].  The  M-ary  (d,k)  code  stream  {a<'')}„^z 
consists  of  the  symbols  from  an  alphabet 
A={Aj,...,AjJ  of  size  M.  The  lengths  of  the 
sequences  of  consecutive  like  symbols  (phrases)  are 
constrained,  and  must  be  in  the  range  [d+l,k+l].  In 
this  letter  we  consider  the  power  spectral  density 
(spectrum)  of  the  M-ary  (d,k)  codes.  The  formula  for 
spectrum  of  maximum  entropy  M-ary  (d,k)  codes  is 
derived. 

The  generating  a  constrained  sequence 
modeled  by  reading  off  the  state  labels 
during  a  random  walk  through  a  finite  directed 
Moore-type  graph  [2,  Chpt  3].  When  probabilities 
between  states  are  specified,  then  a  sequence  of 
graph  states  {sf")} „^^hec,omQ  a  Markov  chain,  and  a 
sequence  ^  memoryless  fiinction  of  the 

Markov  chain  a(''>=h(s(''))  [1,  Chpt.  12]). 

To  derive  a  closed  form  expression  of  the 
power  spectral  density  of  a  memoryless  function  of 
the  Markov  chain  we  use  the  difference  equation 
method  described  by  Vasic  in  [8]  and  [9].  This  is 
primarily  a  numericaly  efficient  algorithm  for 
spectrum  computation,  but  the  Z)-domain  version  of 
this  method  simplifies  the  algebraic  manipulation 
with  the  expression  for  the  autocorrelation  function 
in  cases  when  a  graph  has  some  specific  structure. 


The  Markov  Chain  Generating  M-ary  (d,k) 
Sequences 

The  transition  diagram  of  the  Moore-type 
Markov  chain  generating  M-ary  (d,k)  sequences  is 
shovra  in  Fig.  1 .  The  diagram  consist  of  M  identical 
branches  modeling  the  generation  of  the  phrases  of 
different  symbols  A„,  1^:^.  The  states  are  drawn 
as  a  circles  and  denoted  by  pairs  (m,i),  7.^<M,  7<? 
<^+7.  The  label  inside  the  circle  denotes  the  symbol 
generated  from  this  state  (A„=h((m,i),  for  all  /).  The 
incoming  into  the  state  (m,i)  means  starting  the 
generation  of  the  phrase  of  length  i,  of  symbols  A„. 
The  edge  labels  are  of  the  form  pD,  where  p  is  the 
transition  probability  and  D  is  the  time  delay 
operator.  The  Markov  chain  described  by  the 
transition  diagram  given  in  Fig.  1  is  stationary  and 
ergodic  [1,  Chpt.  12].  The  stationary  probabilities  of 
states  and  transition  probability  between  states  of 
the  Markov  chain  are  given  by  the  following 
theorems. 

Lemma  1:  The  M-ary  (d,k)  constrained  sequence 
achieves  maximal  information  rate  if  the  phrase 
lengths  are  i.i.d.  random  variables  with  probability 
distribution  Pj=(M-l)M-^',  d+l<i^+l,  wherein  C 
is  constant  satisfying 

X*;lM-^'=7/rM-7;.  (1) 

Proof:  The  proof  is  straigforward  generalization  of 
the  Theorem  1  of  Zehavi  and  Wolf  [10].  We  can 
also  prove  this  lemma  by  considering  variable 
length  graph  [6]  of  M-ary  (d,k)  constraint.  This 
graph  contains  k+l-d  loops,  and  the  labels  of  loops 
are  of  the  form  (M-I)D',  d+l<i^+l  (see  [6]). 

Remark  1:  C  is  the  channel  capacity  of  M-ary  (d,k) 
constraint  in  M-ary  unit  of  information  amount. 

Theorem  1:  For  the  Markov  chain  generating  M-ary 
(d,k)  constrained  sequences  of  maximal  entropy  we 
have: 
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a)  The  stationary  probability  of  states  are 
Pr{sM=(m,  i)} =ftj=(l/ML)(Pi+...  +/**+ j), 
l^^+l,  wherein  L  is  average  phrase  length 
L=(d+l)Pj+,  +...+(k+l)P,^+,. 

b)  The  transition  probability  to  (m,i)  from  any  state 

(1,1),  is  Pr{s(”>=(m,i)\s(''-’)=(l,l)}= 

=(M-1)P,. 

Proof:  The  proof  is  given  in  Appendix  A 

Power  Spectral  Density  Formula 

To  derive  a  closed  form  expression  of  the 
power  spectral  density  of  the  stream  which 

is  a  memoryless  function  of  the  Markov  chain 
given  by  the  transition  diagram  shown  in 
Fig.  1  we  use  the  difference  equation  method 
described  by  Vasic  in  [8]  and  [9].  Let  us  denote  the 
Markov  chain  state  set  by  5, and  one  step  transition 
probabilities  between  states  u,veS  by 
Pu\v^Pr{s(”>=u\s(''-’>=v}.  Let  be  the  set  of  states 
from  which  Markov  chain  can  pass  into  state  u,  let 
be  stationary  probability  of  state  u,  and  let 
symbol  A,,  be  generated  from  state  u  i.e.  A„=h(u). 
Then  the  difference  equation  method  is  summarized 
in  the  following  theorem. 

Theorem  2  [8],[9J:  The  autocorrelation  func-tion 
r(’>>=E{aO>aO*’’)},  j  eZ,  of  the  stream 
a<”)=h(sM)  where  a  Markov  chain  can  be 

obtained  from  the  following  set  of  difference 
equations 

fr  =  APu^ 

'•'■>  =  14, 

v&¥ 

Since  the  stream  cyclostationary, 

but  not  wide  sense  stationary  random  process,  for 
finding  its  spectrum  the  Wiener-Kinchine  theorem 
carmot  be  used.  The  problem  is  usually  avoided  by 
considering  the  phase  averaged  process  {a^''*VneZ 
[5].  Since  the  stream  real  valued  it 

follows  that  and  the  spectrum  of  the 

stream  <P(f)--rf*^^+R(D)+R(I>')j, 

where  R(D)  is  D  transform  of  the  r^"). 


{R(D)=Z„^r(’')),  and  D=exp(j27if). 

By  applying  the  system  of  equations  (2) 
given  by  Theorem  2  on  the  Markov  chain 
generating  Af-ary  (d,k)  constrained  sequences  in  Fig. 
1,  with  stationary  and  transition  probabilities  given 
by  statements  of  Theorem  1,  we  obtain  the  formula 
for  the  spectrum. 

Theorem  3:  The  power  spectral  density  of  the  M-ary 
(d,k)  constrained  stream  {a("^'^}„gz  \^  given  by 

1  |G(D)|+(A/-l)Re{G(Z))}-(M-l) 

Lsin'(;?f)  \M-\  +  G{D)f 

where  ^  random  variable 

uniformly  distributed  over  [0,1). 

Proof.  Proof  is  given  in  Appendix  B. 

As  an  illustration  in  Fig.  2  the  spectra  of 
signals  modulated  by  maxentrropic  M-aiy  (d,k) 
sequences  are  shown  for  different  alphabet  size  M. 
The  rectangular  pulse  shape  is  assumed.  We  can 
observe  that  the  content  of  low  frequency  spectral 
components  increases  with  the  increasing  M. 

Conclusion 

By  using  the  difference  equation  method,  we 
have  derived  the  dossed  form  expression  for 
spectrum  of  M-ary  maxentropic  (d,k)  codes.  The 
expression  for  R(D)  is  very  similar  to  one  for  binary 
(d,k)  codes,  obtained  by  Zehavi  and  Wolf  [10]  and 
Gallopoulos,  Heegard  and  Siegel  [5].  Instead  of 
1+G(D)  in  the  denominator  of  the  formula  for  R(D) 
for  binary  codes  we  have  the  term  M-1+G(D). 


Appendix  A 

Proof  of  Theorem  1:  When  a  phrase  of  symbols  Aj 
is  completed,  the  probability  of  selecting  some  other 
amplitude  A„  (A^Mj)  is  1/(M-1).  So,  the  labels  of 
all  edges  incoming  to  the  branch  m  have  term 
1/(M-1).  In  other  words  Prfs(’‘^e{(m,i)\l+d<i^+l} 
\s(''->HI,l)}=l/(M-l).  When  a  branch  is  selected, 
the  probability  of  generating  a  phrase  of  the  length  / 
is  Pj,  d+1^^+1.  OWxonsXy  Pj-O  for  l<i<d+l. 
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and  for  i>k+l.  According  to  Lemma  1,  for 
maxentropic  sequences  we  have  Pj=(M-l)M-^‘, 
d+M^+1,  so  that 

PrjjW  =  =  (/4))  =  ^  ■  P: 

Since  all  M  branches  are  equiprobable, 
according  to  the  Bayes  formula,  for  stationary 

probabilities  7rj=Pr{s(”>=(m,i)}, 
we  can  write  (see  Fig.  1) 

^*+1  “  ^k+l  ■  ^1 

^/  =  1  •  ^>1  +  ^  •  ^1  ’  d  +  \<i<k 
n,=\-n,^„  \<i<d 

By  solving  the  system  (A.2)  we  obtain 
7Cr(l/ML)Y^*'.Pj,  where  L  is  average 

phrase  length.  Q.E.D. 


Appendix  B 

For  Markov  source  generating  A/-ary  (d,k) 
constrained  sequences  (Fig.  1)  the  initial  conditions 

in  difference  equation  method  [8]  act  1 

^^+1,  wherein  TCj  are  state  stationary 

probabilities  given  by  the  statement  of  Theorem  2. 
The  autoccorelation  function  at  n=0,  is 

The  D  transforms  of  difference  equations 
following  from  Fig.  1  are 

(D)  =  —  ■D-WJD)-'^PrDJ-'  + 

k+\ 

j=i 

where  WJD)=  Sm,  ^/.i 

By  siunming  the  expression  for  F„  i(D)  over 
all  m,  and  after  some  algebraic  manipulation  we 
have 

W(D)  =  'Z  F„,j  {D)  =  G{D)-  W(D)  + 

(B-1) 

_1_  l-G(D) 

ML  \-D  ” 

where  G(D;=S*;l,^r^'- 


Without  loss  of  generalifr’  the  bipolar,  zero 
mean  stream  can  be  assumed.  For  maxentropic  case 
all  symbols  are  equiprobable,  so  we  have  that 
A,+...+Af^=^0.  Since  G(D)^,  from  (B.l)  we  have 
W(D)=0.  From  this,  since  W(D)=WJD)+F„  ](D),  it 


follows 

From  the  expression  (B.l)  for  W(D)  it 

follows 

^  t  M-\  +  G{D) 


If  D  transform  of  the  autocorrelation  function  is 
written  in  the  form  R(D)=A]Rj(D)+...+  Afjij^/D), 
where  RJD)=F„j(D)+...+F„^.j(D),  then,  after 
some  manipulation,  we  obtain 


RSD)  =  - 


A„  2D  l-G(D) 
ML{\-D)-  M-\  +  G{D) 


+ 


1 

M \-D 


The  D  transform  of  whole  autocorrelation  function 
is 

1  1  2D  l-G(Z)) 

X-D"  L{\-D)-  M-\  +  G{D) 


From  the  above  expression  and  definition  of  the 
spectrum,  the  statement  of  the  Theorem  3  follows 
directly.  Q.E.D. 
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Abstract 

In  this  paper  digital  system  which  contains  two  parts 
is  considered.  The  first  part  is  optical  communica¬ 
tion  system  with  ASK  modulation.  Optical  receiver  is 
constructed  as  heterodyne  asynchronous  ASK  receiver. 
Signal  is  transmitted  in  baseband  through  the  second 
part  of  system  (coaxial  cable).  Probability  density  func¬ 
tions  for  both  of  hypotheses  (Ho  and  Hi),  optimal 
threshold  and  performance  of  system  are  determined  for 
proposed  system.  Optimal  threshold  is  determined  on 
condition  that  error  probability  is  minimal. 


1  Introduction 

Digital  system  wliicli  is  investigated  in  this  paper 
has  two  parts.  The  first  part  is  optical  communica¬ 
tion  system  with  ASK  modulation.  Optical  receiver 
is  constructed  as  heterodyne  asynchronous  ASK  re¬ 
ceiver.  Signal  is  transmitted  in  baseband  (for  exam¬ 
ple  by  means  of  coaxial  cable)  through  the  second  part 
of  system.  System  ought  to  be  designed  on  condition 
that  error  probability  is  less  than  beforhand  given  value 
(  for  example  10"'’^).  As  optical  system  could  be  de¬ 
signed  that  error  probability  is  slight  enough,  estima¬ 
tor  is  not  strictly  behind  envelope  detector.  Signal  is 
transmitted  in  baseband  by  means  of  section  of  coaxial 
cable,  and  estimation  is  done  at  the  end  of  this  section. 
Noise  components  in  optical  receiver  (shot  noise,  ther¬ 
mal  noise  etc.)  are  approximated  by  white  Gaussian 
noise.  Disturbances  which  are  appeared  during  trans¬ 
mission  by  means  of  coaxial  cable  can  be  represented 
as  sum  of  additive  white  Gaussian  noise  and  crosstalk . 
Crosstalk  is  modelled  as  sinusoidal  disturbance  with 
constant  amplitude  and  uniform  distribution  of  phase. 

Probability  density  functions  for  both  of  hypothe¬ 
ses  [Ho  and  Hi  ),  optimal  threshold  and  performance 
of  system  are  determined.  Optimal  threshold  is  deter¬ 
mined  on  condition  that  error  probability  is  minimal. 


2  Performance  determination 

The  first  part  of  system  is  constructed  as  optical 
communication  system  and  it  contains  optical  ASK 
transmitter,  optical  cable,  3  dB  coupler  (  balanced  de¬ 
tector),  two  photodiodes,  band  pass  filter  and  envelope 
detector.  The  second  part  of  system,  in  baseband,  con¬ 
tains  coaxial  cable,  low-pass  filter,  sampler  and  esti¬ 
mator.  Optical  receiver  is  constructed  as  heterodyne 
asynchronous  ASK  receiver. 

In  Fig.l  i(t)  represents  crosstalk  which  is  modelled 
as  sinusoidal  disturbance  i(t)  =  Ai  cos(a;oi+^i) »  where 
9i  has  uniform  distribution  of  phase  p{0i)  =  l/27r,  | 

$1  |<  TT.  LO  lightwave  addition  to  incoming  optical  sig¬ 
nal  is  implemented  by  using  balanced  detector.  Out- 
coming  signals  of  balanced  detector  (Xi  and  X2)  illu¬ 
minate  separate  photodiodes  and  generate  currents 

1*1  (t)  =  R  \  Xi  H-wi(t)  (1) 

12(1)  =  i?  I  ^2  r  +^2(0  (^) 

where  ni{t)  and  n2{t)  are  shot  noise  components  with 
power  spectral  densities  S'ni(t*^)  =  qK  |  Xi  and 
S'n2(w)  qR  \  X2  h  R  =  r}q/{hv  is  responsivity  of 
photodiode;  r?  is  quantum  efficiency  and  q  is  charge  of 
an  electron.  Total  current  is  determined  as  [1] 

-  i?(|  Xi  I"  -  I  X2  -kn(t) 

2Re{SL*}  +  n{t)  (3) 

Rc{}  is  real  part  of  complex  number,  and  n{t)  — 
ni(t)  —  n2{t)  is  approximated  as  a  zero-mean  white 
Gaussian  noise  with  PSD 

5„(«)  =  qR(\  1^  +  I  X2  n  =  qR{\  5  +  I  i  n  (4) 

As  incoming  optical  signal  has  form 

5  =  ae{0,l}  (5) 
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+BIAS 


n/t) 


Fig  1. 


and  LO  lightwave  is 

L  =  (6) 

total  cniTent  will  be 


t2(t)  =  a2R^yJ\P^cos\{(JJo  —  u;io)t] 

=  a2Ry/PgPio  cos(c*;//rt)  +  n(t)  (7) 

So,  problem  is  reduced  to  the  classical  detection  prob¬ 
lem  of  a  known  signal  in  additive  white  Ganssian  noise 
[2], [3].  As  it  has  been  shown  in  [3],  probability  density 
function  for  stochastic  variable  Y  (behind  envelope  de¬ 
tector)  has  Rayleigh  distribution  when  a  =  0 

Pyiyflfo)^-  y>0  (8) 

and  Ricean  distribution  when  a  =  1 

Py{y/Ih)  =  fie-’^''^Io{^,  y>0  (9) 

whore  =  qRPto  and  A  =  27?v7V^. 

Disturbances  which  are  appeared  during  transmis¬ 
sion  by  means  of  coaxial  cable  can  be  represented  as 
sum  of  additive  zero-  mean  white  Ganssian  noise  (with 
variance  cr^)  and  crosstalk.  Crosstalk  is  modelled  as 
sinusoidal  clistnrbance  with  constant  amplitude  (Aj) 
and  uniform  distribution  of  phase  (^j  ).  So,  conditional 
probability  density  function  for  stochastic  variable  Z 
(in  reception  place)  for  tlie  case  of  hypothesis  Ho  can 
be  determined  as 


v)  = 


1  .  (2  -  y  - cos^i)^. 


cr^  V  "TT 


where  has  uniform  distribution  and  y  has  Rayleigh 
distribution  and  PDF  is 


Pz/hA‘)  I_^J^ 

2(72 


±JL 

2ir  tr® 


(11) 


On  the  same  way  PDF  for  stochastic  variable  Z  (  in 
reception  place  )  for  the  case  of  hypothesis  i/j  is 


PmA^)-r  f 

J -pi  Jq 


exp{-^- - ^-5 - L.) 


If  we  set  cos^i  =  x  an<l  use  Chebyshev  quadratures 
expressions,  (11)  and  (12)  become 


PzlHii  (z)  = 


r 


1  ^ 
_ i _ Y" 


yexpi-  ■ 


{z  -  y  -  AjXkY" 
'  2<ri 


)dy  (13) 


•j  ^  rco 


,  y  (z-y- AiXkp^^  ,Ay^^ 

^"2<t2  ■  2(t2  {>'^) 


2(t| 
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where  »  Conclusion 

Xk  =  cos{— (2ifc  -  1)),  (A;  =  1, 2 . N)  (15)  Based  on  obtained  results  for  the  proposed  system, 

^  we  can  conclude  that  this  system  can  be  applied  in 

PDF  for  hypothesis  Ho  is  given  graphically  in  Fig.  practice. 

2.  Ai/tr  and  A\fa-i  are  parameters. 

PDF  for  hypothesis  //i  is  given  graphically  in  Fig.  References 

3.  Aja  and  Ai/tra  are  parameters.  System  ought  to 

be  designed  on  condition  that  error  probability  is  not  ^  ^  ^  “Performance  of  Coher- 

greater  than  beforehand  given  value.  Error  probability  Proceedings  of  the  IEEE, 

dependence  on  threshold  is  drawn  m  Fig.4.  Optimal  vol.78,  no.8,  August  1990." 

threshold  is  determined.  It  is  0.65A.  Error  probabil- 

ity  dependence  on  signal  to  noise  ratio  (20log(A/cr),  in  [2]  A.D.  Whalen,  "Detection  of  Signals  in  noise”, 

dB),  for  this  optimal  threshold,  is  drawn  in  Fig.5.  Academic  Press,  New  York  and  London,  1971. 

[3]  N.R.  Levin,  "Teoreticheskie  osnovi  statisticheskoy 
radiolehniki” ,  Sovetskoe  radio,  Moscow  1974. 
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Abstract 

A  new  techique  for  combining  the  LMS  and  LMF  cost  sanc¬ 
tions  is  proposed  in  this  contribution.  The  resulting  stochas¬ 
tic  gradient  adaptive  algorithm  uses  a  time  varying  mixing 
parameter  to  optimise  a  combination  of  the  above  cost  func¬ 
tions,  taking  into  consideration  the  noise  statistics.  Fur¬ 
thermore,  the  behaviour  of  the  proposed  algorithm  is  anal¬ 
ysed  and  convergence  conditions  are  established.  Simula¬ 
tion  results  verify  the  ability  of  the  algorithm  to  adapt  itself 
to  the  noise  characteristics,  illustrate  its  enhanced  perfor¬ 
mance  and  support  very  well  the  theoretic  analysis.  The 
continuous  adaptation  of  the  mixing  parameter  adds  flex¬ 
ibility  and  enables  rapid  response  of  the  algorithm  to  non 
stationarities . 

1.  Introduction 

Stochastic  gradient  adaptive  algorithms  are  quite  popular 
and  have  been  used  in  a  wide  variety  of  applications  includ¬ 
ing  array  processing,  system  identification,  channel  equal¬ 
ization,  echo  and  other  interference  cancellations,  mainly 
due  to  their  inherent  simplicity.  LMS  is  the  mostly  known 
such  algorithm  and  its  performance  has  been  thoroughly 
investigated  in  the  literature  (e.g  [4,  5,  7,  12]).  Its  most 
attractive  feature  is  its  amenability  to  simple  implemen¬ 
tation,  while  its  main  drawback  is  the  degradation  of  its 
performance  due  to  eigenvalue  spread. 

The  LMS  algorithm  belongs  to  a  more  general  family,  which 
attempts  to  optimize  (minimize)  the  following  cost  function 
[12] 

J  =  E{e^^^{n)}  A' =1,2,....  (1) 

cind  is  obtained  by  setting  K  equal  to  1,  i.e.,  J  =  E{e^(n)}. 
For  =  2,  we  obtain  the  second  member  of  the  family,  i.e., 
J  =  £^{e^(n)},  known  as  the  Least  Mean  Fourth  (LMF) 
algorithm.  Walach  and  Widrow  compared  the  two  algo¬ 
rithms  above  and  found  out  that  under  certain  conditions 
the  LMF  outperforms  LMS  [12].  Furthermore,  it  is  obvious 
that  when  far  from  the  optimum  (i.e.,  e(n)  >  1  )  the  LMF 
algorithm  exhibits  faster  convergence. 

The  LMS-f-F  algorithm  [8]  was  a  first  approach  towards  the 
combination  of  the  advantages  of  the  two  algorithms.  More 
precisely,  LAf5-fF  aimed  at  exploiting  the  faster  initial  con¬ 
vergence  of  the  LMF  algorithm,  while  retaining  the  desir- 
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able  LMS  characteristic  of  low  misadjustment  and  immu¬ 
nity  over  the  different  distributions  of  noise,  when  around 
the  optimum. 

A  mixed  criterion  algorithm  of  this  type,  obtained  though 
from  a  different  perspective,  was  also  presented  in  [3].  The 
proposed  algorithm  was  derived  in  an  attempt  to  minimise 
the  variance  of  the  square  error  subject  to  a  constraint 
on  the  mean  square  error.  The  constrained  minimisation 
method  resulted  in  a  weight  update  formula  with  a  fixed 
preselected  mixing  parameter.  However,  no  information 
was  provided  about  the  optimal  value  of  the  mixing  param¬ 
eter  or  the  constraint.  The  proposed  combination  appeared 
also  in  [l],  where  a  constant  mixing  parameter  A  was  used. 
In  this  contribution,  the  A  selection  criterion  of  [8]  is  mod¬ 
ified,  so  that  the  evolution  of  the  A  sequence  takes  into 
account  the  noise  characteristics,  enhancing  thus  the  algo¬ 
rithm’s  performance.  Moreover,  the  adaptive  nature  of  the 
An  sequence  adds  flexibility  to  our  algorithm. 

2.  The  LMS-I-F  Adaptive  Algorithm 

The  following  type  of  cost  function  is  proposed 

J  =  (l-A„)£{e"(n)}  +  A„£{e^(n)},  (2) 

where  An  is  a  time  varying  scalar  sequence  and  a  is  a  scahng 
factor,  selected  according  to  the  following  rule 

An  +  a  ,if  £‘{en}  >  1 

An  —  cv5pn{Ce  (n)}  ,  otherwise  ’ 

where  sgn  is  the  sign  {signum)  function,  C^(n)  =  E{en}  — 
3E{en}  is  the  kurtosis  (fourth  order  cumulant)  of  the  as¬ 
sumed  zero  mean  error  signal  and  a  is  a  scaling  factor  lying 
in  the  interval  [0, 1].  By  choosing  a  =  1.0  no  “transient” 
behaviour  is  exhibited  and  the  algorithm  just  alternates  be¬ 
tween  LMS  and  LMF.  To  preserve  the  unimodal  character¬ 
istic  of  the  mixed  cost  function,  as  both  of  the  consisting 
functions  are  characterised  by  convexity,  An  in  (2)  is  con¬ 
fined  to  the  closed  interval  [0, 1]. 

As  it  can  be  seen  from  the  first  part  of  (3) ,  when  far  from 
the  optimum  weight  vector,  i.e.,  e[n)  >  1,  An  increases,  in¬ 
creasing  therefore  the  convergence  speed,  but  as  the  weight 
vector  Hn  approaches  its  optimum  value  (Jf°^*),  An  ad¬ 
justs  itself  according  to  the  noise  distribution,  enhancing 
therefore  the  algorithms ’s  performance.  The  first  part  of 
the  above  formula  implicitly  assumes  that  the  noise  vari¬ 
ance  is  lower  than  unity.  If  this  is  not  the  case,  it  can  be 
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either  ignored  or  automatic  gain  control  or  normalisation 
with  respect  to  the  input  power  could  be  applied  to  enable 
the  use  of  the  above  update  equation,  which  sigmficantly 
improves  the  initial  convergence  of  the  LMS-hF  algorithm 
in  non  stationary  environments  [8].  Applying  the  steepest 
descent  gradient  search,  and  using  the  instantaneous  value 
of  the  gradient  instead  of  the  mean,  the  filter  coefficient 
vector  update  equation  for  the  cost  function  (2)  becomes 


Hn+l  =  Hn 

,  +  2p  ((1  -  An)e(n)  +  2Ane®(n)) 

(4) 

where 

e(n)  ^  Wn  —  XnVn 

(5) 

and  the  weight 

error  vector  is  given  by 

(6) 

We  can  now  proceed  to  analyse  the  proposed  algorithm. 


3.  Convergence  analysis  of  the  LMS+F 
algorithm 

To  facilitate  our  analysis  we  introduce  the  commonly  used 
assumption  that  the  various  input  vectors  come  from  mutu¬ 
ally  independent  zero  mean  gaussian  distributed  sequences 
[2,  4,  5,  12].  Although  this  is  untrue  in  many  applications, 
since  consecutive  input  vectors  share  N  1  entries,  it  is 
widely  accepted  to  capture  the  first  order  behaviour  and  is 
extensively  used  in  the  literature  to  simplify  the  analysis 
producing  at  the  same  time  reliable  results  [7].  In  our  case, 
the  assumption  can  be  relaxed,  in  that  we  only  require  that 
the  input  sequence  is  uncorrelated  with  the  filter  weights. 
Other  than  that,  no  restriction  applies  to  the  nature  of  the 
input  autocorrelation  matrix  R. 

We  also  approximate  the  conditional  expectation  terms  of 
the  form  E{el\Vrt}  with  the  unconditional  mean  square  es¬ 
timation  error.  For  slow  adaptation  conditions,  the  weight 
error  vector  oscilates  around  the  mean  value  justifying  par¬ 
tially  the  above  assumption.  Furthermore,  according  to  the 
centr£il  limit  theorem,  as  the  filter  length  increases,  the  dis¬ 
tribution  of  the  error  signal  gets  closer  to  the  gaussian  one. 
This  assumption  is  not  new  and  has  led  to  analytic  results 
which  agreed  well  with  the  simulated  behaviour  of  nonlin¬ 
ear  algorithms  [2,  6]. 

Finally,  the  estimation  error  Cn  is  assumed  to  follow  a  gaus¬ 
sian  distribution.  The  above  assumption  is  justified,  when 
the  weight  vector  Hn  varries  much  slower  than  the  input 
vector  Xn\  a  condition  corresponding  to  the  slow  adapta¬ 
tion  case.  This  assumption  has  produced  reliable  results 
and  was  used  succesfuUy  in  [6,  11].  The  approximate  va¬ 
lidity  of  the  above  assumptions  will  be  confirmed  by  the 
simulation  results. 

By  substracting  from  both  sides  of  equation  (4) 

using  (6)  and  (5)  results  in 

Vn-l-l  =  Vn  +  ((1  An)(w;n  “* 

+  ^i{2\n{Wn-X^Vr.f)Xr^.  (7) 

We  wish  now  to  develop  a  recursive  equation  for  the  time 
evolution  of  the  correlation  matrix  of  the  weight  error  vector 


Vn.  Using  Kn  to  denote  this  correlation  matrix  at  time 
instant  n,  we  have,  by  definition, 

Kr.  =  E{V^V^}.  (8) 

Substituting  (7)  in  (8)  we  obtain 

E{Vn+xVZ+r}  =  E{VnVj} 

-1-  E  ((1  ”  An)6n  H"  2AnCn)  “f*  XnVn  ]  } 

4-  E  ((1  “  An)en  +  2Anen)  ^  •  W 

To  obtain  the  individual  terms  on  the  right  hand  of  the 
above  equation  we  take  conditional  expectations  (with  re¬ 
spect  to  Vn)  and  then  average  over  all  Vn.  We  thus  obtain 

E  {en  {VnX^  +  XnVj)  }  =  -  {KnR  +  RKn)  •  (10) 

Adopting  a  similar  approach  and  applying  Price’s  [10]  the¬ 
orem  on  the  right  hand  terms  of  (9),  we  obtain  [9] 

E  {el  {VnX^  +  XnVj)  }  =  {KnR  +  RKn) ,  (11) 

E  {elXnx{}  =  <i^  +  2i^Kni^,  (12) 
E{elXnX^}=^  3<Tt^R  +  l2al^RKnR.  (13) 

cind 

E{elXr,X^}  =  15crl^R  +  90at„RKr.R.  (14) 

Substituting  (10),  (11),  (12),  (13)  and  (14)  in  (9)  we  obtain 
the  following  equation  for  the  error  correlation  matrix 

Kn-\-l  =  Kn  M  ((1  ^ri)  +  OAnCT^^)  {KnR  +  RKn) 

+  t?  ((1  -  A„)VL  +  12An(l  -  A„)<t^„  +  eOA^cr®  J  R 
+  IJ^  (2(1  —  An)^  4"  48A(1  —  An)  "h  360A  RRnRy  (18) 

Due  to  the  form  of  the  above  equation,  i.e.,  non  linear,  an 
exact  convergence  condition  is  difficult  to  find.  To  fascih- 
tate  our  analysis  we  introduce  the  concept  of  the  distance 
Tn  [11] 

Tn  =  tr[Ai<::]  =  tv[RKr.]  =  E{iV^Xr.f}.  (16) 

Adopting  the  approach  in  [4]  we  develop  the  following  suf- 
ficient  and  necessary  condition  on  the  step  size  parameter 
that  ensures  mean  square  convergence 

2  ((1  —  An)  +  6An  {o'w  +  ^n)) 

(/l+/2)(iV  +  2/3)7mo:.  ’  ^  ’ 

where 

/l  =  (1  -  An)"  +  12A„(1  -  A„)(2<t^  +  Tn),  (18) 

E  =  60A^  {Sffi  +  3<T^Tn  +  T^)  ,  (1®) 

/3  =  (1  -  An)"  +  24A(1  -  An)  +  180A"  {<rl  +  T„)'*(20) 

and  ymax  stands  for  the  maximum  eigenvalue  of  the  input 
correlation  matrix  R. 
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4.  Simulation  results 


GAUSSIAN  DISTRIBUTED  NOISE 


In  this  final  section  we  present  and  analyse  the  results  ob¬ 
tained  from  simulations.  The  algorithm  is  applied  to  a  sys¬ 
tem  identification  problem,  where  the  system  to  be  identi¬ 
fied  is  considered  non  stationairy.  The  optimum  filter  coeffi¬ 
cients  assume  the  following  ini  tied  vedues  =  [0.2,  0.4, 

0.6,  0.8,  1.0,  1.0,  0.8,  0.6,  0.4,  0.2]  cind,  after  that,  experi¬ 
ences  random  disturbances.  The  mdl  vector  (0)  is  chosen  as 
the  initial  vector  -  stcirting  point  -  Hoj  and  all  the  results 
are  obtained  by  averaging  over  an  ensemble  of  100  runs. 
The  pzu-ameter  A  is  initiedised  to  1,  i.e.  we  start  with  the 
LMF  algorithm.  The  input  is  assumed  gaussian  distributed 
and  both  noise  aind  input  sequences  are  assumed  to  be  zero 
mean  i.i.d  with  input  and  noise  variance  equal  to  unify  and 
0.1  respectively.  The  System  mismatch  (£^{Vi?’Vr»})  is  se¬ 
lected  as  a  performance  measure.  The  me^Ln  squcire  emd 
the  mean  foimth  value  of  the  error  are  estimated  using  the 
following  formula 

£{e^}  =/?£{e*_i}  +  (l-/?)e*,  (21) 


Figure  2:  Performance  comparison  between  the  LMS  (dot¬ 
ted  line),  the  LMF  (dash  dot  line)  and  the  LMS  -f  F  al¬ 
gorithm  (solid  line)  under  gaussian  distributed  noise  condi¬ 
tions 


where  A;  =  2, 4  and  the  constant  is  a  memory  controlling 
factor.  The  larger  the  value  it  assumes  the  "stronger”  the 
memory  of  the  system.  Alternatively,  a  finite  length  moving 
window  could  be  used. 


UNIFORMLY  DISTRIBUTED  NOISE 


LAPLACIAN  DISTRIBUTED  NOISE 


Figure  3:  Performance  comparison  between  the  LMS  (dot¬ 
ted  line),  the  LMF  (dash-dot  line)  and  the  LMS  ^  F  al- 
gorithm  (solid  line)  imder  laplacian  distributed  noise  con¬ 
ditions 


Figure  1:  Performance  comparison  between  the  LMS  (dot¬ 
ted  line),  the  LMF  (dash  dot  line)  and  the  LMS  -h  F  al¬ 
gorithm  (solid  line)  under  uniformly  distributed  noise  con¬ 
ditions 

Figures  1,2  and  3  depict  the  performance  behaviour  of 
all  the  cilgorithms,  namely  LMS,  LMF  and  LMS-hF^  un¬ 
der  various  noise  conditions.  The  parameters  of  the  above 
algorithms  were  chosen  so  as  either  to  match  the  steady 
state  error  (misadjustment)  or  the  convergence  rate,  de¬ 
pending  on  the  insight  in  the  performance  of  the  mixed 
algorithm  they  provide.  The  chosen  values  are  cis  follows: 
f^LMS  =  4.5  •  fiLMF  =  PLAf5+F  =  1-8  *  10““^  for 

the  uniformly  distributed  noise  case  (figure  1)  and  fiiMs  = 
t^LMF  =  /i£,Af5+F  =  1.8* for  the  gaussian  and  laplacicin 
distributed  noise  cases  in  figiues  2  and  3  respectively.  The 
random  distiubances  are  assumed  to  follow  a  uniform  dis¬ 


tribution  with  variance  =  0.3  in  figure  1  and  =  0.25 
in  figures  2  and  3. 

In  figme  1  we  see  that  the  LMS-hF  algorithm  behaves 
as  the  LMF  providing  feist  convergence  and  low  steady  state 
error.  To  achieve  the  same  steady  state  error  with  the  LMS 
algorithm,  we  had  to  significantly  decrease  the  convergence 
factor  p. 

In  figures  2  and  3  we  observe  that  the  LMS-hF  adopts 
the  initial  high  tracking  speed  of  the  LMF  algorithm  but,  as 
it  approaches  the  optimum,  it  gradually  ch2Lnges  to  the  LMS 
algorithm  (laplacian  noise  case)  or  combines  both  criteria 
(gaussisin  noise  case)  to  achieve  a  lower  steady  state  error. 

The  adaptability  of  the  L A/S -/-F  algorithm  to  time  vary¬ 
ing  noise  distributions  is  shown  in  figure  4,  where  its  per¬ 
formance  is  also  compared  with  that  of  LMS  and  LMF.  As 
in  the  previous  figures,  the  LMS^hF  exploits  the  fast  initial 
convergence  of  the  LMF  algorithm  and  after  that  adopts  the 
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Figure  4:  Performance  curves  of  the  algorithms  under  time 
varying  noise  conditions  (LMS:  dotted,  LMF,  dash  dot, 
LMS  4-  F:  solid) 


Figure  5:  Comparison  between  the  theoretically  obtained 
upper  bound  on  (solid  {(T^  =  0.1)  and  dashed  =  0.3) 
lines)  and  and  the  ones  obtained  from  simulations  (stars 
{(tIj  =  0.1)  and  circles  (o*^  =  0.3))  (A  =  <5  =  1/2). 

performance  criterion  that  exhibits  the  lower  steady  state 
error. 

Next,  we  focus  our  attention  on  the  theoretically  ob- 
t^dned  upper  allowable  value  for  the  convergence  factor, 
which  is  a  fimction  of  the  parameter  A.  When  A  is  allowed 
to  take  any  value  in  the  closed  interval  [0, 1]  the  convergence 
condition  is  dictated  by  that  of  LMF  (i.e.,A  =  1),  since  it 
poses  more  stringent  convergence  bounds.  However,  if,  for 
any  re2ison,  the  An  sequence  is  constrained  to  [0,5],  where 
5  <  1,  then  the  necessary  and  sufficient  condition  is  ol> 
tained  from  equation  (17)  by  setting  A  equal  to  5.  This 
latter  case  is  depicted  in  figure  5,  where  a  comparison  be¬ 
tween  the  expected  (Eq.l7)  and  observed  upper  bounds  is 
presented  for  two  values  of  noise  variance  and  S  equal  to 
l/2.  It  is  easily  observed  that  the  simulation  results  are  in 
good  agreement  with  the  theoretical  ones,  especially  when 
far  from  the  optimum.  The  observed  deviation  from  the  the¬ 


oretic  curves  near  the  optimum  is  justified  by  noting  that 
the  theoretical  results  are  obtained  using  averages,  whereas 
in  simulations,  we  deal  with  the  instantaneous  values  of  the 
stochastic  variables.  It  can  be  also  observed  that,  when  far 
from  the  optimum,  the  condition  on  the  step  size  is  domi¬ 
nated  by  the  distance  Tn  and  is  almost  independent  of  the 
noise  variance. 

5.  Conclusions 

In  this  contribution,  a  new  technique  for  mixing  the  LMS 
and  LMF  cost  functions  was  presented.  It  differs  from  what 
was  previously  suggested  in  that  it  is  time  varying  and  takes 
into  account  the  noise  distribution.  The  proposed  algorithm 
Wcis  analysed  and  conditions  regarding  the  behaviour  and 
stability  were  established.  Simulation  results  verify  the  im¬ 
proved  performance  of  the  proposed  2dgorithm  and  support 
well  the  theoretic  results. 
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Abstract 

This  paper  presents  a  novel  nonlinear  filter  for  nar¬ 
rowband  interference  suppression  in  multiple  access 
communication  systems,  in  particular  code  division 
multiple  access  spread  spectrum.  The  proposed  algo¬ 
rithm  combines  a  recursive  Hidden  Markov  Model  es¬ 
timator,  Kalman  filter  and  the  recursive  Expectation 
Maximization  algorithm.  It  is  shown  that  the  proposed 
algorithm  outperforms  current  linear  and  nonlinear  fil¬ 
tering  techniques,  presented  in  Rusch  and  Poor  [3]. 


1  Introduction 

Code  Division  Multiple  Access  (CDMA),  also  known 
as  Spread  Spectrum  Multiple  Access  (SSMA),  provides 
a  means  of  separating  the  signals  of  multiple  users 
transmitting  simultaneously  and  occupying  the  same 
RF  bandwidth.  In  a  Direct  Sequence  (DS)  CDMA  sys¬ 
tem  each  user  has  a  distinct  pseudonoise  (PN)  code 
(or  sequence).  The  message  from  each  user  is  modu¬ 
lated  with  the  corresponding  PN  code,  resulting  in  a 
transmission  bandwidth  much  greater  than  the  mes¬ 
sage  bandwidth. 

One  of  many  reasons  for  spreading  the  spectrum  is 
the  inherent  immunity  of  the  communication  system  to 
interference.  It  is  well  known  that  system  performance 
is  greatly  enhanced  if  the  receiver  employs  some  means 

^Partially  supported  by  the  Australian  Telecommunications 
and  Electronics  Research  Board  (ATERB),  Australian  Research 
Council  (ARC)  and  the  Co-operative  Research  Centre  for  Sensor 
Signal  and  Information  Processing  (CSSIP) 


of  suppressing  narrowband  interference  prior  to  signal 
“despreading” .  Such  techniques  are  possible  since  they 
exploit  the  different  nature  of  the  received  signals.  The 
power  spectrum  of  a  narrowband  interference  is  highly 
peaked,  while  the  message  signals  are  spread  over  a 
wide  bandwidth. 

Given  T  observations  of  the  received  noisy  signal 
computing  optimal  estimates  of  the  narrowband  inter¬ 
ference  and  finite-state  spread  spectrum  signal,  it  is 
necessary  to  consider  all  N'^  realizations  of  the  iV-state 
T-point  spread  spectrum  signal.  This  is  computation¬ 
ally  prohibitive,  thus  the  only  feasible  estimators  are 
suboptimal. 

Nonlinear  suboptimal  techniques  for  narrowband  in¬ 
terference  suppression  in  spread  spectrum  systems  are 
presented  in  [1,  2,  3].  These  offer  improved  perfor¬ 
mance  over  the  time  domain  linear  methods  summa¬ 
rized  in  [1].  The  narrowband  interference  is  modeled 
as  a  Gaussian  autoregressive  process.  Assuming  known 
parameters,  the  interference  signal  is  estimated  using 
an  Approximate  Conditional  Mean  (ACM)  filter  [4]. 
For  the  specified  assumptions  on  the  observation  pro¬ 
cess  in  [3],  the  ACM  filter  for  interference  estimation 
turns  out  to  be  a  Kalman-type  recursive  filter  with  non- 
linearities  to  deal  with  the  finite  state  spread  spectrum 
signal.  The  nonlinearities  take  the  form  of  a  soft  deci¬ 
sion  feedback  which  seeks  to  remove  the  spread  spec¬ 
trum  signal  from  the  estimation  of  the  narrowband  in¬ 
terference.  For  the  general  case  of  unknown  interfer¬ 
ence  statistics  the  soft  decision  feedback  was  incorpo¬ 
rated  into  a  LMS  adaptive  filter  [2].  A  modification 
and  an  enhancement  to  this  approach  is  presented  in 
[3],  where  multiple  spread  spectrum  users  are  consid¬ 
ered.  The  proposed  Enhanced  Nonlinear  Adaptive  al- 
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gorithm  (ENA)  of  [3]  shows  a  significant  improvement 
over  existing  linear  and  nonlinear  adaptive  filters. 

In  this  paper,  we  present  a  new  nonlinear  filter  and 
parameter  estimator  for  narrowband  interference  sup¬ 
pression  in  spread  spectrum  systems.  We  consider  a 
slightly  more  general  signal  model  than  [3]:  In  partic¬ 
ular,  we  model  the  received  sampled  signal  as  the  sum 
of  the  spread  spectrum  signal  (modeled  as  a  finite  state 
Markov  chain),  narrowband  interference  (modeled  as  a 
Gaussian  autoregressive  (AR)  process)  and  observation 
noise  (modeled  as  a  zero  mean  white  Gaussian  process). 

There  are  at  least  two  reasons  that  justify  model¬ 
ing  the  spread  spectrum  signal  as  a  finite  state  Markov 
chain:  i)  Asynchronous  multi  user  transmission  and 
ii)  Oversampling.  Each  of  these  cases  induce  correla¬ 
tion  in  the  received  spread  spectrum  signal,  and  hence 
the  first-order  Markov  chain  assumption  is  more  real¬ 
istic  than  the  iid  assumption.  In  examples  where  the 
above  two  reasons  do  not  hold  and  the  spread  spectrum 
signal  is  iid,  simulations  show  that  our  algorithm  still 
performs  as  well  as  or  better  than  the  ENA/ ACM  algo¬ 
rithm  in  [3].  Moreover,  our  algorithm  has  a  comparable 
computational  cost  to  the  algorithms  in  [3] . 

ffighlights  of  our  HMM-KF  algorithm 

Our  algorithm  cross-couples  two  optimal  filters  -  a 
Hidden  Markov  Model  (HMM)  filters  and  a  Kalman 
Filter  (KF).  As  described  below,  together  with  the  re¬ 
cursive  EM  algorithm,  these  coupled  filters  yield  esti¬ 
mates  of  the  narrowband  interference,  spread  spectrum 
signal  and  their  parameters.  We  call  our  algorithm  the 
HMM-KF  algorithm. 

Methedology:  We  first  explain  the  motivation 
for  our  algorithm:  If  the  narrowband  interference  is 
exactly  known  at  a  given  time,  then  the  estimation 
task  reduces  to  the  problem  of  extracting  the  finite- 
state  Markov  chain  (spread  spectrum)  in  additive  white 
Gaussian  observation  noise.  The  Markov  chain  can 
be  optimally  extracted  using  the  well  known  Hidden 
Markov  Model  filter.  Furthermore,  on-line  (or  adap¬ 
tive)  parameter  estimates  of  the  HMM  (including  tran¬ 
sition  probabilities  and  noise  variances)  can  be  ob¬ 
tained  via  the  recursive  EM  algorithm  presented  in  [5]. 

On  the  other  hand,  if  the  spread  spectrum  sig¬ 
nal  is  exactly  known  at  a  given  time,  then  the  esti¬ 
mation  problem  reduces  to  the  problem  of  extracting 
a  Gaussian  autoregressive  process  (narrowband  inter¬ 
fence)  embedded  in  white  Gaussian  noise.  Then  opti¬ 
mal  state  estimates  of  the  narrowband  interference  are 
obtained  using  a  Kalman  filter  [5].  Furthermore,  on¬ 
line  parameter  estimates  (autoregressive  coefficients, 
process  and  observation  variances)  can  be  achieved  via 


the  recursive  EM  algorithm  [5,  6]. 

Since  we  do  not  have  an  exact  knowledge  of  the  nar¬ 
rowband  interference  or  the  spread  spectrum  signal,  we 
propose  the  following  scheme:  cross-couple  the  above 
two  recursive  EM  algorithms,  one  algorithm  for  the 
HMM  and  the  other  for  the  noisy  AR  model.  The  pro¬ 
posed  algorithm  is  called  the  HMM-KF  algorithm.  It 
is  schematically  shown  in  Fig.l. 

2  Problem  Formulation 

We  now  present  the  signal  model  in  detail  and  state 
our  estimation  objectives. 

2.1  A  Model  for  the  Received  Signal 

We  assume  a  similar  spread  spectrum  and  narrow- 
band  interference  signal  model  to  [2],  In  particular, 
assume  that  the  continuous-time  received  signal  y{t) 
consist  of  the  spread  spectrum  signal  s{t)  from  all  N 
users,  the  narrowband  interference  i{t)  and  the  white 
Gaussian  observation  noise  n(t).  That  is:  y{t)  — 
s{t)  -I-  i{t)  +  n{t).  If  y{t)  is  sampled  at  chip  rate  of  the 
PN  sequence,  the  resulting  discrete  time  observations 
[3]  are: 

yk  =  Sk  +  ik  +  (1) 

where  tik  is  rt  zero  mean  white  Gaussian  process  with 
variance  cr^  and  Sk  is  a  discrete-state  iid  process. 

The  unknown  narrowband  interference  ik  is  modeled 
as  a  Gaussian  autoregressive  process  of  known  order  p, 
which  can  be  written  as 

ift  =  — diik—i  ’  ’  *  dpik~p  “b  Cfc  (2) 

where  Ck  is  white  Gaussian  noise,  independent  of  Sk 
and  Uk ,  with  zero  mean  and  variance  . 

In  [3],  the  spread  spectrum  signal  Sk  is  the  sum  of  N 
independent  and  identically  distributed  (iid),  equiprob- 
able,  binary  random  variables.  This  is  so,  since  the  user 
message  and  the  PN  sequence  are  assumed  purely  ran¬ 
dom.  Furthermore,  it  is  also  assumed  that  each  user  is 
received  at  the  same  (normalized)  power  and  are  chip 
synchronous. 

In  this  paper  we  assume  that  the  received  spread 
spectrum  signal  s{t)  are  sampled  at  a  rate  higher  than 
the  chip  rate  of  the  PN  sequence.  This  yields  samples 
which  are  correlated  in  time.  Hence,  we  assume  is 
a  finite-state,  discrete-time,  homogeneous,  first-order 
Markov  chain.  Consequently,  the  state  at  time  k  is 
one  of  a  finite  number  M  of  states  q  =  (^i,  92)  •  •  •  >  9Af)- 
The  transition  probability  matrix  is  A  =  (omn)  where 
amn  =  ^(^t+1  =  ^n\st  =  ^m)  and  m,n  e 
Of  course  amn  >  0,  ~  1,  for  each  m.  Let  tt 
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denote  the  initial  state  probability  vector:  tt  =  (Tr^n), 
'^m  —  -P(^l  —  ^m)' 

Remark  :  The  number  of  levels  M  are  equal  to  iV  +  1 
when  all  N  users  are  received  at  the  same  power.  If 
each  user  transmits  ±1,  then  ^3  —  G 

Increasing  the  sampling  rate  will  yield 
a  transition  probability  matrix  A  with  an  increasing 
diagonally  dominant  elements.  Setting  the  sampling 
rate  to  the  chip  rate  yields  the  signal  model  in  [3].  This 
is  merely  a  special  case  of  our  signal  model  with  all  the 
rows  in  A  identical. 

2.2  Estimation  Objectives 

Let  (/>o  =  {A,  q,  D,  denote  the  true  parameter 

vector  that  characterizes  the  narrowband  interference 
(AR  signal)  and  the  spread  spectrum  signal  (Markov 
chain). 

Given  the  observations  =  (l/i,  • .  • ,  y/f),  our  aim  is 
twofold: 

1.  State  Estimation:  Compute  estimates  of  the 
narrowband  interference  4  and  and  the  spread 
spectrum  signal  . 

2.  Parameter  Estimation:  Derive  a  recur¬ 
sive  estimator  for  (^Oj  where  = 

fork  >  1,  given  the 

observations  Y^. 

Why  estimate  A  and  q?  When  the  received  power  levels 
are  time-varying  and  asynchronous  data  transmission 
is  used,  A  and  q  may  not  be  known  apriori.  Indeed, 
A  is  a  complicated  function  of  the  number  of  levels 
M j  the  transmitted  power  of  each  user,  sampling  rate 
and  transmission  mode  (synchronous  or  asynchronous) . 
Hence  the  motivation  for  estimating  A  and  g. 

Remark:  We  assume  that  the  number  of  states  M  of 
the  Markov  chain  is  known.  Also,  for  convenience  we 
assume  that  TVm  =  1/M,  for  m  =  1 . . . ,  M, 

From  (1)  and  (2)  it  is  quite  clear  that  estimating 
optimal  (maximum  a  posteriori  MAP)  state  estimates 
of  4  and  s/j  and  computing  optimal  (maximum  like¬ 
lihood  ML)  parameter  estimates  of  the  signal  model 
is  computationally  infeasible  since  the  computational 
cost  is  exponential  in  the  data  length.  In  the  following 
subsection  we  present  our  sub-optimal  nonlinear  algo¬ 
rithm  for  narrowband  interference  suppression,  which 
combines  two  optimal  filters  to  achieve  both  state  and 
parameter  estimation. 

3  HMM-KF  Narrow^band  Interference 
Suppression  Algorithm 

In  this  section  we  present  our  HMM-KF  algorithm. 
The  HMM-KF  algorithm  cross  couples  two  recursive 


EM  algorithms,  one  algorithm  for  a  HMM  and  the 
other  for  a  noisy  AR  model.  The  algorithm  is  schemat¬ 
ically  shown  in  Fig.l: 

1.  At  time  Ar,  the  Kalman  filter  and  recursive  EM 
parameter  estimator  for  the  narrowband  interference 
yield  estimates  of  the  state  of  ,  process  noise  vari¬ 
ance  cTg,  observation  noise  variance  cr^,  and  the  AR 
coeflGicients  di , . . . ,  dp . 

2.  The  Hidden  Markov  Model  filter  and  recursive  EM 
parameter  estimator  for  the  spread  spectrum  signal 
gives  on-line  estimates  of  the  state  of  transition 
probability  matrix  A  and  Markov  chain  levels  g. 

We  now  present  Steps  1  and  2  in  more  detail. 

3.1  Spread  Spectrum  Signal  Estimator  Using  Re¬ 
cursive  HMMs 

At  time  Ar,  we  have  available  the  predicted  narrow- 
band  interference  and  variance  of  the 

predicted  error  =  ik  —  ik\k-i  obtained  from  the  KF 
of  Sec.  3.2.  Therefore,  the  HMM  to  be  estimated  is: 
HMM  Signal  Model: 

Vk  -  44^.1  =  Sk  A  Wk  +nk  (3) 

We  assume  that  the  Kalman  predicted  error  Wk  is  mod¬ 
eled  as  a  zero  mean  white  Gaussian  process  with  vari¬ 
ance  and  is  independent  of  the  observation  noise 

Uk. 

The  recursive  HMM  estimator  recursively  updates 
the  state  and  parameter  estimates  of  the  HMM.  The 
recursive  HMM  parameter  vector  estimate  at  k  is  de- 
noted  as: 

Given  the  signal  model  (3),  the  state  and  adaptive 
parameter  estimation  procedure  for  the  spread  spec¬ 
trum  signal  Sk  is  straightforward.  Details  can  be  found 
in  [7]. 

3.2  Recursive  Narrowband  Interference  Estima¬ 
tion  Using  Kalman  Filtering 

The  HMM  estimator  described  in  Sec.  3.1  yields  fil¬ 
tered  state  estimates  of  the  spread  spectrum  signal. 

Given  the  spread  spectrum  signal  estimate  Sk\k  and  the 

associated  error  variance  of  Wk  ^  Sk  —  Sk\ky  our 
aim  now  is  to  compute  state  and  parameter  estimates 
of  the  narrowband  interference.  The  signal  model  is 
given  by: 

Vk  -  Sk\k  =  4  +Wk+nk  (4) 

where  is  the  observation  and  4  is  the  narrowband 
interference  signal.  Wk  ^  ^(0,Psfe,J  is  modeled  as  a 
zero  mean  white  Gaussian  process  with  variance 
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and  assumed  independent  of  the  observation  noise  n* 
JVCO.o-^)  and  process  noise  e*  ~  N{0,<tI).  (4)  can  be 

represented  eis  the  following  state  space  model: _ 

State  Space  Model: 

Xk  =  Fxk-i  +  Gck 

Dk  -  Sk\k  =  Hxk  +  Wk  +  nk  (5) 

where  the  state  vector  Xk  =  (4) *fc-ii  •  •  •  > 


F 

G 


Opxl  J 


D=idu..-,dpY, 


(lOixp)',  if  =  (10ixp). 


(6) 


The  recursive  EM  estimator  recursively  updates  the 
narrowband  interference  autoregressive  coeflScients,  the 
narrowband  interference  process  noise  and  observation 
noise.  The  recursive  EM  parameter  estimate  at  k  is 

denoted  as:  <^|^p  =  \  <^n^  ^)- 

Given  the  signal  model  of  (5) ,  the  state  and  adaptive 
parameter  estimation  procedure  for  the  narrowband  in¬ 
terference  ik  is  straightforward.  Details  can  be  found 
in  [7]. 
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4  Conclusion 

We  have  presented  a  new  solution  to  the  problem  of 
narrowband  interference  suppression  in  CDMA  spread 
spectrum  communication  systems.  Nonlinear  Hidden 
Markov  Model  (HMM)  signal  processing  techniques  to¬ 
gether  with  a  Kalman  Filter  (KF)  was  used  to  derive 
a  high-performance  algorithm  for  suppressing  the  nar¬ 
rowband  interference  and  to  simultaneously  yield  esti¬ 
mates  of  the  spread  spectrum  signal.  Computer  simu¬ 
lations  show  that  the  proposed  algorithm  outperforms 
current  linear  and  other  nonlinear  filtering  techniques. 
Although,  the  algorithm  is  difficult  to  analyze  theoret¬ 
ically,  simulation  studies  in  [7]  have  shown  satisfactory 
estimates  in  several  cases. 
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Figure  1 .  Proposed  Adaptive  Narrowband  in' 
terference  Suppression  Algorithm. 
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Abstract 

A  new  blind  FIR  filter  receiver  is  proposed  for  the  detec¬ 
tion  of  DS-CDMA  signals  in  unknown  Multi-User 
Interference  (MUI)  and  Additive  White  Gaussian  Noise. 
The  proposed  receiver  is  motivated  by  the  Wiener  signal 
reconstruction  theory  and  it  is  a  very  low  complexity 
alternative  to  the  Minimum-Output-Energy  (MOE) 
receiver  [8],  At  only  a  minimal  increase  of  the  computa¬ 
tional  cost,  the  proposed  detector  outperforms 
significantly  the  conventional  Matched  Filter  (MF) 
receiver.  It  also  compares  favorably  to  the  decorrelating 
detector  [2]  (with  similar  near-far  resistance),  despite 
the  fact  that  the  latter  utilizes  the  assumed  known  MUI 
spreading  codes.  The  novel  characteristic  of  the  pro¬ 
posed  receiver  is  the  incorporation  of  an  auxiliary  vector 
component  that  allows  statistically  optimal,  adaptive 
steering  of  the  filter  with  respect  to  the  incoming  DS- 
CDMA  signal. 

1.  Introduction 

Spreading  the  spectrum  of  a  signal  to  make  it  virtually 
indistinguishable  from  background  noise  has  served  as 
the  basic  principle  that  led  to  the  development  of  spread 
spectrum  communication  systems.  The  main  motivation 
for  the  development  of  such  systems  emerged  from  mili¬ 
tary  communication  needs  to  ensure  effective  suppression 
of  intentional  interference  as  well  as  to  increase  security 
in  signal  transmission.  Currently,  Direct-Sequence  Code 
Division  Multiple  Access  (DS-CDMA),  a  specific  form 
of  spread  spectrum  transmission,  is  receiving  consider¬ 
able  interest  in  response  to  an  ever-increasing  demand  for 
better  utilization  of  the  available  resources  in  mobile 
radio  and  personal  communication  environments. 

While  the  overall  capacity  of  a  CDMA  system  is 
determined  by  both  the  forward  (base-to-mobile)  and  the 
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reverse  (mobile-to-base)  link,  most  of  the  recent  research 
focused  on  the  reverse  connection  and  dealt  with  process¬ 
ing  at  the  base  station  under  the  assumption  of  known 
active  user  population.  Given  the  unrealizable  complex¬ 
ity  and  the  prohibitive  computational  requirements  that 
the  optimal  multiuser  detector  exhibits  [1],  any  proposal 
for  a  suboptimal  reduced  complexity  receiver  is  well 
justified.  Arguably,  the  list  of  such  successful  proposals 
includes  the  decorrelating  receiver  [2],  multistage  archi¬ 
tectures  [3] -[5],  and  decision  feedback  detectors  [6]. 

In  cellular  systems,  multiuser  detection  is 
envisioned  at  the  base  station  for  the  simultaneous 
recovery  of  all  signals  of  the  known  intracell  users. 
While  this  is  the  situation  in  the  reverse  link,  in  the  for¬ 
ward  link  the  mobile  user  faces  an  even  more  challenging 
problem,  namely  the  detection  of  its  own  signal  in  the 
presence  of  unknown  multiuser  interference  and  additive 
white  Gaussian  channel  noise.  In  addition,  processing  at 
the  mobile  should  meet  even  tighter  complexity,  size, 
and  weight  requirements  than  the  base  station.  The  handy 
matched  filter  (MF)  solution  exhibits  unacceptable  per¬ 
formance  degradation  in  the  presence  of  one  or  more 
high-power  interferers  (the  "near-far"  problem  [1]). 
Therefore,  it  can  only  be  used  with  some  form  of 
stringent  and  costly  power  control.  Recently  two  interest¬ 
ing  linear-filter  alternatives  were  proposed  in  the  form  of 
a  Minimum-Mean-Square-Error  (MMSE)  filter  that 
requires  a  separate  training  sequence  [7]  and  a  blind 
"minimum  energy"  linear  receiver  [8].  If  L  is  the  system 
processing  gain  (as  high  as  127),  then  the  latter  blind 
receiver  may  require  frequent  inversion  of  the  LxL  sam¬ 
ple  autocorrelation  matrix  of  the  received  signal. 
Equivalently,  the  adaptive  implementation  requires 
steepest  descent  in  the  space. 

In  this  paper  we  reconsider  the  issues  of  single- 
user  detection  in  unknown  spread-spectrum  MUI  and 
AWGN  from  the  Wiener  signal  reconstruction  viewpoint 
and  we  propose  a  new  blind  low-complexity  receiver  for 
the  forward  link  of  DS-CDMA  communication  systems. 
We  examine  low-complexity  alternatives  to  the  work 
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presented  in  [8]  that  maximize  the  Signal  -  to  -  Interfer¬ 
ence  -  plus  -  Noise  -  Ratio  (SINK)  [10].  The  proposed 
receiver  is  a  very  low  complexity  alternative  to  the  MOE 
detector  and  at  only  a  minimal  increase  of  the  computa¬ 
tional  cost  outperforms  significantly  the  conventional  MF 
detector.  It  also  compares  favorably  to  the  decorrelating 
detector  (with  similar  near-far  resistance),  although  the 
latter  utilizes  the  assumed  known  MUI  spreading  codes. 


proposed  scheme,  we  opt  to  present  a  theorem  that 
identifies  a  Wiener  reconstruction  filter  with  an  inherent 
MUI  cancellation  property. 

Let  {So,Gi,  *•*  ,Gl-i}  be  a  set  of  orthonormal 
vectors  in  (an  orthonormal  basis  of  R^  that  includes 
So).  For  some  arbitrary  scalar  k^K)  and  VGj  we  define 

G*=Gi+kSo,  i  =  l,  .  (4) 


2.  System  Modeling 


Although  the  results  that  follow  are  directly  applicable  to 
the  asynchronous  case,  for  the  sake  of  brevity  and  clarity 
of  presentation  we  choose  to  present  our  work  in  the  con¬ 
text  of  synchronous  CDMA.  We  consider  a  CDMA  sys¬ 
tem  where  K  users  transmit  synchronously  over  an 
AWGN  channel.  The  continuous-time  received  signal  is 
modeled  as  follows: 


K-l 


r(t)-EE^bk(i)Sk(t-iT)  +  n(t) , 


i  k=0 


(1) 


where  for  the  k-th  user  is  the  received  energy, 
bk(J)  is  the  i-th  information  bit,  and  s^it)  is  the 

signature  (spreading  code).  T  is  the  symbol  (bit)  period 
and  n(t)  is  the  channel  AWGN.  The  signature  of  every 
user  is  composed  of  L  spreading  chips  and  it  is  of  the 

form  Sk(t)  =  ]^CkG)PTjt-(j-l)Tc]»  where  L  is  the  so 

j=i 

called  system  processing  gain, 

^^jtO)  j=l»  "  *  are  the  assigned  signature 

bits  for  the  k-th  user,  and  Pr  (0  is  the  spreading  pulse 
with  duration  Tc=T/L.  Without  loss  of  generality  the  sig¬ 
natures  are  assumed  to  be  normalized.  Since  we  consider 
synchronous  transmission  we  concentrate  on  a  single 
information  bit  interval  of  length  T.  The  R^  discrete¬ 
time  version  of  (1)  is 


r=  2:^^kSk+n  . 
k=0 


(2) 


The  random  vector  n  is  assumed  to  be  WG  with  auto- 
correlation  matrix  E{n’^n}  =a^lLxL-  Assuming  that  the 
user  of  interest  is  user  o,  it  is  convenient  to  define  the 


K-l 


multiuser  interference  (MUI)  term  1=  J^V^b^Sk.  This 


k=l 


allows  us  to  write  (2)  as  follows: 


r=V^boSo+I  +  n.  (3) 


3.  Blind  Low-Complexity  Detectors 

In  this  section  we  use  the  theory  of  Wiener  signal- 
reconstruction  filters  to  derive  simple  linear  receivers  that 
maximize  the  output  SINR.  Before  we  proceed  with  the 


Theorem  1  [10]:  Let  Sq  and  Gi*,  G2  * '  ’  ,  Gl-i 
defined  by  (4)  for  some  k^O  be  vectors  in  the  R*^  space 
used  as  input  data  sequences  for  the  Wiener  reconstruc¬ 
tion  of  the  received  signal  vector  r  in  (2)  with 
corresponding  tap-weights  Wo,Wi,  ••  •  ,Wl-i.  Then, 

(i)  the  optimal  weighting  coefficient  Wq  is 

L-l 

Wo  =  En{<r,So-‘kXGi>}  ,  and  (5) 

i=l 

(ii)  for  any  instance  of  the  received  signal  r  the  filter 
<r,  So-k^Gi>  cancels  completely  all  MUI  vectors  I 

within  the  interference  subspace  V/  spanned  by 
{Gi*,---  ,GL_,}.a 

Part  (ii)  of  Theorem  1  motivates  the  proposal  for  a 
detector  of  the  form 

6o=sgn(Wo)  .  (6) 

The  following  proposition  places  the  conventional  MF 
and  the  decorrelating  receiver  in  the  Wiener  filter  con¬ 
text. 

Proposition  1:  (i)  The  matched  filter  (MF)  receiver 
<r,  So>  is  the  result  of  optimal  single-tap  Wiener  recon¬ 
struction  of  the  received  signal  vector  r  with  input  data 
sequence  Sq.  (ii)  In  a  K-usct  CDMA  environment,  the 
decorrelating  receiver  is  the  result  of  optimal  tap 
Wiener  reconstruction  of  the  received  signal  vector  r 
with  input  data  sequences  the  user  spreading  codes 
So,  Si,  *  ■  •  ,Sk-i,  that  are  all  assumed  known.  ^ 

Remarks:  1)  Perfect  interference  cancellation  for 
any  arbitrary  parameter  k^^O  is  guaranteed  only  for 
interference  vectors  I  on  the  Vj  hyper-plane.  Cancella¬ 
tion  of  interferers  I  not  on  V/  requires  tuning  of  the 
parameter  k^^O.  2)  The  size  (length)  L  of  the  adaptive 
Wiener  reconstruction  implementation  may  render  the 
proposed  detector  unrealizable  for  a  mobile  user.  There¬ 
fore,  careful  consideration  of  lower  size  Wiener  recon¬ 
struction  filters  appears  well  motivated.  3)  A  result  simi¬ 
lar  to  part  (ii)  of  Proposition  1  was  given  in  [9]  and  the 
statistically  equivalent  context  of  Least  Squares. 

Following  our  notation,  let  G  be  an  arbitrary  nor¬ 
malized  vector  in  R^  orthogonal  to  the  spreading  code  of 
the  user  of  interest  So.  In  other  words 
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<G,So>=0  and  <G,G>=1.  (7) 

The  following  result  is  a  direct  corollary  to  Theorem  1: 

Corollary  I:  Let  Sq  and  G  be  as  in  (7)  and  let  k  be 
a  non-zero  scalar.  Consider  the  2-tap  Wiener  reconstruc¬ 
tion  filter  with  input  data  sequences  Sq  and  G-^kSo,  and 
corresponding  tap- weights  and  w  i .  Then, 

(i)  the  MS  optimal  value  of  is 

Wo=En{<r,So-kG>}  ,  and  (8) 

(ii)  the  linear  filter  So-kG  cancels  all  MUI  vectors  I  in 
the  direction  of  G-t-kSo- 

Again,  the  proposed  decision  statistic  is  the  tap- 
weight  Wg  itself  and  the  detector  is  6o=sgn(Wo)  as  in  (6). 
Sample  average  can  be  used  in  place  of  the  expectation 
with  respect  to  n  in  (8)  assuming  that  multiple  samples 
of  r  are  available.  Therefore,  for  notational  simplicity  we 
can  drop  the  expectation  from  (8)  without  any  loss  of 
generality.  With  input  r  given  by  (3),  the  output  of  the 
filter  in  (8)  is 

<r.  So  -  kG> = bo + <1,  So>  -  k<I,  G>  -i- 


l^MVDR=E{<r,So><r,G>}/E{<r,G>^}  .  n  (n) 

The  auxiliary  vector  G  with  realizations  con¬ 
strained  by  (7)  is  -within  a  sign  ambiguity-  the  average 
normalized  projection  of  the  received  signal  vector  r  onto 
the  subspace  orthogonal  to  the  spreading  code  So.  The 
sign  of  this  projection  can  be  either  sgn(<r,So>)  or 
— sgn(<r,  So>).  Therefore,  without  loss  of  generality,  if  we 
write 


then 


ris  =ign(<r,So>) 


r-<r,So>So 

Vl  Irl  lW,So>2  ’ 


G=E{ris}/IIE{r,sjll  . 


(12) 


(13) 


The  proposed  receiver  is  completely  defined  by 
equations  (6),  (8),  (11),  (12),  and  (13).  In  the  next  section 
we  present  some  numerical  results  and  comparisons  that 
support  our  theoretical  arguments. 


4.  Numerical  Results  and  Simulations 


+  <n,So-kG> .  (9) 

Then  for  fixed  auxiliary  vector  G  the  average  variance  of 
the  output  (the  expectation  is  taken  with  respect  to  both 
bo,  I,  and  n)  is 

E  { <r.  So  -  kG>2  }=Eo-t-E{[<I,So>-k<I,G>]2}-i- 

+  (l-HkV^.  (10) 

While  the  MUI  cancellation  property  of  the  receiver 
described  in  part  (ii)  of  Corollary  1  is  unsatisfactory,  (9) 
shows  that  we  can  still  succeed  in  canceling  effective 
MUI  if  we  choose  the  auxiliary  vector  G  to  be  the  aver¬ 
age  (normalized)  projection  of  r  onto  the  subspace 
orthogonal  to  So  and  we  use  as  a  steering  parameter 
that  places  the  filter  So“kG  orthogonally  to  the  interfer¬ 
ence  vector  L  Moreover,  (10)  shows  that  classical  blind 
minimum  variance  optimization  of  the  filter  So-kG 
(which  is  distortionless  in  the  direction  of  interest  So) 
leads  in  fact  to  a  maximum  Signal  to  Interference  plus 
Noise  Ratio  (SINR)  receiver.  The  following  proposition 
optimizes  the  filter  with  respect  to  the  scalar  k.  T^e  aux¬ 
iliary  vector  G  is  defined  immediately  after. 

Proposition  2:  If  So  is  the  spreading  code  of  the 
user  of  interest  and  G  is  some  auxiliary  vector  with  reali¬ 
zations  constrained  by  (7),  then  the  value  of  the  steering 
scalar  k  that  minimizes  the  variance  expression 
E{  <r,So-kG>^}  (maximizes  the  average  output  SINR) 
is 


We  examine  a  scenario  of  four  users  each  equipped  with 
a  signature  of  length  L=15  and  signature  cross  correla¬ 
tion  matrix  given  by 


15  11  11  11 


11  7  7  15 


(14) 


We  compare  the  Bit  Error  Rate  (BER)  performance  of 
the  input-driven  auxiliary- vector  receiver  with  the  con¬ 
ventional  matched  filter,  the  MOE,  and  the  decorrelating 
receiver  for  synchronous  CDMA  transmission  over  an 
AWGN  channel.  The  results  are  shown  in  Fig.  1  and  Fig. 
2.  Near-far  resistance  comparisons  are  shown  in  Fig.  3. 


Fig.  1:  Bit  error  rate  as  a  function  of  the  SNR  of  the  user 
of  interest  in  the  presence  of  weak  interferers  (SNRi=2, 
SNR2=3,  SNR3=4). 
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Fig.  2:  Bit  error  rate  as  a  function  of  the  SNR  of  the  user 
of  interest  in  the  presence  of  strong  interferers  (SNRi=8, 


Fig.  3:  Bit  error  rate  as  a  function  of  the  Near-Far 
coefficient  (SNR,=1,  SNRi=8xA^FC,  SNR2=9xA^FC, 
SNR3=10xA^FC). 

V.  Conclusions 

We  reconsidered  the  concept  of  multiuser  detection  for 
DS/CDMA  communication  systems  from  the  point  of 
view  of  Wiener  signal  reconstruction  filters.  We 
identified  both  the  decorrelating  detector  and  the  signa¬ 
ture  matched  filter  receiver  as  a  direct  special  case  of 
Wiener  signal  reconstruction.  Generalizing  this  result  we 
proposed  an  L-tap  Wiener  adaptive  receiver  with  a 
powerful  inherent  MUI  canceling  property.  However,  the 
size  of  the  filter  (L  taps  where  L  is  the  system  processing 
gain)  appears  prohibitive  and  may  restrict  severely  the 
practicality  of  this  approach.  In  view  of  these  observa¬ 
tions  the  natural,  low  complexity  outcome  of  this  line  of 
work  is  a  linear,  scalar  parameterized,  auxiliary-vector 
receiver.  The  conceptual  and  computational  simplicity  of 
this  receiver  promises  some  immediate  practical  utility. 
The  optimization  can  be  carried  out  easily  in  a  variety  of 
different  ways.  In  this  work  we  chose  to  develop  a  blind 
(unsupervised)  solution  that  maximizes  the  output 
Signal-to-Interference-plus-Noise  Ratio  (SINR).  In 
future  work  we  will  consider  optimization  in  the 
minimum  probability  of  error  sense  (non-least-squares 
supervised  learning  [11]). 


The  newly  proposed  blind  auxiliary- vector  receiver 
compares  favorably,  both  complexity-wise  and 
performance-wise,  to  the  decorrelating  detector  [2], 
although  the  latter  utilizes  the  assumed  known  signatures 
of  the  interferers.  This  is  because  the  blind  maximum 
SINR  criterion,  in  contrast  to  the  "decorrelating"  cri¬ 
terion,  strives  to  achieve  the  perfect  balance  between 
MUI  and  channel  noise  suppression.  The  optimal  near-far 
resistance  of  the  decorrelator  appears  closely  matched  by 
the  auxiliary- vector  receiver  over  a  wide  range  of  realis¬ 
tic  near-far  ratios. 

In  view  of  these  results,  the  linear,  blind, 
auxiliary-vector  filter  becomes  a  candidate  for  the 
receiver  of  choice  for  the  forward  link  of  mobile  cellular 
DS-CDMA  communication  systems.  On  the  other  hand,  a 
bank  of  blindly  optimized  auxiliary-vector  filters  may  be 
deployed  as  the  reverse-link,  base-station  receiver. 
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Abstract 

In  this  work,  we  analyze  the  performance  of  linear 
minimum  mean  squared  error  (LMMSE)  estimate  based 
multiuser  detector  for  CDMA  communication  systems.  Re¬ 
markable  consistency  is  achieved  through  numerical  eval¬ 
uation  of  our  analytical  results  and  computer  simulations. 
We  also  compare  the  performance  of  the  LMMSE  detector 
[1]  and  the  adaptive  bootstrap  multiuser  detector  [2].  Our 
analyses  and  simulations  show  that  even  though  these  two 
detectors  were  proposed  based  on  different  optimization 
criteria,  they  exhibit  approximately  equal  performance  in 
multiuser  CDMA  communication  applications. 


h  Introduction 

Multiuser  separation  and  interferaice  suppression  is  an 
active  researdi  topic  in  CDMA  communications  area.  Var¬ 
ious  detectors  have  been  proposed  to  balance  the  computa¬ 
tional  simplicity  and  reliable  detection  poformance  [1  —  6] . 
Most  of  the  proposed  detectors  treat  the  multiuser  signd 
vector  as  a  deterministic  vector,  and  work  only  on  the  con- 
ditionalprohabUity  daisity  function  (PDF)  of  the  data  given 
the  multiuser  signal  vector.  We  notice  that  the  information 
bit  of  multiuser  is  actually  a  random  vector  with  known 
statistics.  Therefore,  by  incorporating  prior  knowledge  (in 
a  statistical  sense)  of  the  multiuser  signal  vector,  one  can 
always  improve  the  ovaall  detection  performance  [6],  The 
availability  of  the  prior  knowledge  dq)ends  on  the  specific 
communication  systems.  If  we  constrain  our  detector  to  be 
in  the  linear  class,  we  can  find,  as  proposed  in  [1]  and  not 
accurately  termed  as  minimum  mean  squared  error  (MMSE) 
detector,  the  sub-optimal  linear  MMSE  (LMMSE)  detector. 
In  this  paper,  we  furtha*  analyze  the  performance  of  the 
LMMSE  detector  in  detail;  verify  our  results  through  nu- 

*This  work  was  supported  in  part  by  the  Office  of  Sponsored  Research, 
NJIT,  and  Rome  Air  Force  Lab  under  contract  F30602-94-C-0135, 


merical  evaluation  and  computa*  simulations.  Comparative 
study  of  performances  of  the  linear  class  of  decorrelating  de¬ 
tector,  LMMSE  detector,  and  adaptive  bootstrap  multiusa 
detector  is  also  provided  in  this  work. 

2.  Problem  Formulation 

Due  to  the  multiple  access  (MA)  sdieme  used  in  CDMA 
systems,  the  data  r{t)  available  at  the  receiva  is  actually  a 
mixture  of  multiusa  data  embedded  in  additive  noise.  That 
is, 

K 

i  fc=l 

whae  ak{i),  bk(i),  Sk{t)j  and  Tk  are  the  bit  enagy, 
information  bit,  signature  waveform,  and  the  transmission 
delay  of  the  kth  usa  in  the  zth  bit  symbol  interval  (of 
duration  T),  respectively.  n(t)  is  a  white  Gaussian  process, 
with  two-sided  powa  spectral  density  of  cr^. 

In  this  work,  we  consida  the  case  when  7-fc  =  0  (A:  = 
I, ..  .,K),  which  corresponds  to  the  synchronous  chan¬ 
nel.  ITie  synchronous  channel  model  is  valid  for  down-link 
channel  (from  base  station  to  mobiles).  Note  that  once  the 
channel  is  synchronized,  all  the  information  bits  of  the  mul- 
tiusCTS  in  the  ith  symbol  interval  are  completely  contained 
in  the  data  r(t)  within  the  ith  symbol  interval.  Iherefore, 
we  can  concentrate  on  solving  multiuser  sq)aration  problem 
within  a  specific  symbol  interval,  say  i  =  0  th,  without  lose 
of  generality.  After  ignoring  index  i,  we  rewrite  data  in  (1) 
into  a  matrix  form, 

r(t)  =  S’’(t)-A-b  +  n(t)  ,  0<t<T.  (2) 

withS^(t)  =  [  si(t)  S2(t)  •  •  •  s^it)  ]  : 

A  =  diag  { yir,  \/a2,  ■  •  • ,  ^/^  } 
andb  =  [  6i  62  •  •  •  ]^  . 

At  the  receiver  end,  we  first  filter  data  r{t)  with  a  bank 
of  matdied  filters,  whose  impulse  responses  are  given  by 
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hk{t)  =  Sk{T  -t),  {k  =  l,2,...,K).  We  then  stack 
the  ou^uts  of  the  bank  of  matched  filters,  sampled  at  t  —  T, 
into  a  vector  and  get  the  following  matrix  notation, 

X  =  P  •  A  •  b  +  n ,  (3) 

whCTe  X  =  [  xi{T)  x2{T)  ■■■  Xk{T)  ]  ,  with 

XkiT)  =  r{t)  *  hk{t)  \i=T  being  the  A;th  matched  filtear 
outputsampled  at  timeinstantt  =  T;  and  n  ~  A/'(0,  P) 
being  the  colored  Gaussian  noise  due  to  the  match^  filter¬ 
ing.  Note  that  matrix  P  in  (3)  is  the  correlation  matrix  of  the 
signature  waveforms.  P  is  symmetric  and  positive  defimte 
and  its  elemaits  are  givai  by, 

/■’’  .X  .  X  ,  A  i=  1,2,  ..., 

=  Si{t)  Sj{t)  dt  -  pij  .  j  =  \^2,  K. 

In  practice,  due  to  the  finite  bandwidth  constraint  and  large 
number  of  users,  the  signature  waveforms  are  not  idealy 
orthonormal.  Therefore  the  matrix  P  will  not  be  an  identity 
matrix  in  general.  The  non-diagonal  nature  of  the  P  matrix 
will  cause  the  intofa-ence  between  multiusars.  In  order 
to  remove  the  multiple  access  interference  (MAI),  various 
detectors  have  bear  proposed  [1  —  6].  One  major  effort  of 
proposing  various  detectors  is  trying  to  balance  the  compu¬ 
tational  simplidty  and  reliable  detection  performance. 

3.  Linear  Class  of  Multiuser  Detectors  and 
Their  Performances 

In  this  work,  a  comparative  performance  study  of  vari¬ 
ous  linear  multiuser  detectors  is  conducted,  with  emphasis 
on  LMMSE  detector  and  the  adaptive  bootstrap  multiuser 
detector. 


Note  that  in  (5),  the  matrix  P^  is  a  (K  - 1)  x  (i^  - 1)  matrix, 
constructed  from  P  matrix  after  removing  the  contribution 
of  the  fcth  user,  is  the  kth  column  of  matrix  Pfc.  The 

inequality  0  <  (1  -  PiVjfe)  ^  ^ 
the  equality  holds  if  and  only  if  P  is  a  diagonal  matrix. 
This  corresponds  to  the  case  of  using  a  sets  of  perfect 
orthonormal  signature  waveforms.  Hence,  the  performance 
of  the  decorrelatmg  detector  in  (4)  is  always  worse  than  the 
BPSK  limit,  as  shown  in  the  inequality  in  (5).  The  near-far 
resistance  of  the  decorrelating  detector  can  be  easily  se^ 
from  its  Pe{k)  expression  in  (5),  since  Peik)  is  invariant  to 

Ak  =  diaglyoT,  •  •  • ,  y/ak-i,  y/dk+i,  ■■■ ,  x/ok  }• 

We  further  notice  the  following  fact  that  the  above 
decorrelating  detector  is  also  a  linear  estimate  based  detec¬ 
tor.  Part  of  its  limited  performance  is  due  to  the  fact  that 
linear  estimate  d  =  P~'  x  is  based  on  the  conditionalPDV 
p(x|0).  Therefore,  we  can  expect  to  further  improve  its 
detection  performance  and  maintain  its  linear  feature,  by 
incorporating  the  joint  statistics  of  both  9  and  x  into  the 
estimate  as  demonstrated  in  the  following  linear  minimum 
Ttipan  squared  error  (LMMSE)  estimate-based  detector. 

3,2.  Linear  Minimum  Mean  Squared  Er¬ 
ror  (LMMSE)  Detector 

It  is  well  known  that  among  the  linear  class  of  estimates, 
LMMSE  estimate  exhibits  the  minimum  mean  squared 
estimation  error.  Therefore  we  can  improve  the  detec¬ 
tion  performance  of  the  above  linear  decorrelating  detector 
by  deriving  a  LMMSE  estimate-based  multiuser  detector. 
Specifically,  let  us  rewrite  formula  (3)  as. 


3.1.  Simple  Decorrelating  Detector 


x  =  PAb  +  n=PP-l-n.  (6) 


The  simple  decorrelating  detector  [4]  is  originated  by 
finding  a  linear  conditional  maximum  likelihood  estimate 
(MLE)  of  the  signal  vector  0  =  A  •  b  fi:om  the  conditional 
PDF  p(x|0)  obtained  fi-om  (3).  It  then  detects  the  multiuser 
information  bit  b  by  directly  making  decision  on  the  linear 
conditional  MLE  &  =  P”'  •  x , 


^  =  sign{z}  =  sign{P  '  •  x}  .  (4) 

This  detector  has  the  advantage  of  structural  simplicity.  It 
is  also  near-far  resistant.  But  a  poteitial  problem  associated 
with  this  detector  is  that  noise  is  enhanced  by  the  P”* 
inverse  filtoing.  We  have  shown  in  [6]  that  this  detector 
has  its  limited  performance  with  an  error  probability  of  the 
fcthuser. 


Peik)  =  Q 


hUi-plP^'Pk) 


(5) 


We  assume  that  all  the  componaits  of  the  random  multiusa- 
information  bit  b  are  indq)aident  and  identically  distributed 
(i.i.d.)  with  zero  mean  and  xmit  variance.  Even  further, 
the  random  vector  9  and  noise  vector  n  are  statistically 
independent  For  most  communication  applications,  these 
assumptions  are  reasonable  ones.  We  then  define  a  new 
random  vector  as  follows. 


A 

■  9  ■ 

Ab  ■ 

A  o' 

■  b  ■ 

y  = 

X 

.  X 

PA  I 

n 

The  expectation  and  covariance  matrix  of  the  above  newly 
defined  vector  y  can  be  foimd  as. 


E(y)  =  o, 


cov(y)  = 


A2  A^P 

A 

E00  Eflx 

PA^  PA^P-ffT^P 

.  Ex0  Exx. 
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Givai  X,  £'(y),  and  cov(y),  the  LMMSE  estimate  of  6 
can  then  be  derived  from  the  following  [7,8], 

^ LMMSE  =  E{9)  +  Ee  X  Ex  X  ( X  -  ■E'(x) ) 

=  (P  +  rr2A-2)-Ex.  (V) 

V - V - ' 

w 

Hie  LMMSE  estimate-based  multiuser  detector  makes  its 
decision  based  on  the  dedsion  rule, 

6  =  sign  (  ^LMMSE  )  •  (8) 

Note  that  the  diagonal  matrix  = 

diag{  or'^/ai,  a^/a2,  •  •  •,  (tVak  }  involvedin  W  of  for¬ 
mula  (7)  is  actually  the  inverse  SNR  matrix.  When  the 
interferences  of  otha  usm  are  very  small  compared  to 
the  noise  level,  W  reduces  into  a  diagonal  matrix,  or 
W  =  (1  + Inthiscase, 

LMMSE  detector  has  the  same  performance  as  that  of  the 
single  user  detector,  which  is  the  BPSK  limit  When  the 
interference  levels  are  vay  large  compared  to  the  noise 
level,  thenW  =  (P  +  cr^  A-^)-!  «  p-^,  LMMSE  detec¬ 
tor’s  performance  is  conq)ariable  to  that  of  the  decorrelating 
detector.  Tha-efore,  the  overall  p^ormance  of  LMMSE 
detector  is  betta:  than  that  of  the  decorrelating  detector. 
Even  further,  we  calculate  the  probability  of  error  of  the  ifcth 
user  based  on  the  following  observation.  We  arrange  the 
ordo-  of  all  the  uscts  sudi  that  6'^  =  [  0  ^  ] .  We 

then  decompose  the  above  derived  6immse  ss  follows, 
^LMMSE  =  Wx=  (P  +  cr^  A~^)“*  X, 

=  (P-l-<r2A-2)-i.(PAb  +  n), 

(9) 

=  0-o-2WA-ib  +  Wn, 

—  9  B  , 

where  the  estimation  aror,  e  =  — cr^WA“*b  -f  Wn, 
contains  both  bias  and  noise  components.  We  will  notice 
latCT.  the  improved  performance  of  the  LMMSE  detector 
is  achieved  by  trading  in  a  little  bias  for  less  noise  vari¬ 
ance.  which  finally  results  in  less  overall  mean  squared 
error  (MSE). 

For  a  given  information  bit  b,  LMMSE  detector  of  (8) 
makes  an  ororous  decision  on  the  fcth  user’s  information 
bit  bk  whaieva-, 

ek<-y/aj^,  when  6*  = -1-1 , 

OR 

e*  > +V^)  when6j;  =  -l. 

Tha-efore,  the  probability  of  error  for  the  ifeth  usa  can  be 
expressed  as. 


Pe{k)  =Pe{k\h,).P{h,)  , 

—  2  {  Pi^k  >  I  =  ~1,  bj)+ 

Piek  <  -Var|6*  =  -M,b;fe)  }  ■  P  (b*) 


1 


y/dk  Pkjbk  —  —1)  b^.)  N  ^ 
<Lk  y 

■\/dk  +  Rk{bk  —  +1,  bj.)  \ 

o-fc  )_  ’ 


(10) 

whaePifc(  )  = -cr^wJ’A-ib;  P  wj  ;  and 

Wj  is  the  Arth  row  of  W  matrix. 

We  also  voLfied  the  followingfacts  that  imda  various  inta- 
faence  conditions,  the  above  error  probability  expression 
will  reduce  eitha  into  the  single  usa  BPSK  limit  or  into 
that  of  the  decorrelating  detector  as  follows. 


Pe{k)={ 


Q 


small  interfaoices 


,  strong  interfaences 


For  the  case  of  two  users  (K  =  2).  the  error  probability 
expression  in  (10)  can  be  simplified  as. 


with 


^2  (1  +  <^Vq2) /ysT -I- 

(l  +  (TVai)(l-f,TVa2)-p2  > 


R,(-l  _n  ^  ^2  (l  +  <7Va2)/V5r-/>/^ 

(1  +  “f  ’ 

<7  _  ^ 

(1 -|- <T2/ai)(l -t- (72/02)  — 


We  can  also  easily  verify  the  following  limit  results. 

Urn  Pe(l)  =  Q  ,  BPSK  limit 

\  ^  ) 

Um^Pe(l)  =  Q  ‘  detector 
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4.  Examples  and  Implementation  Issues 

Numerical  examples  and  computer  simulation  results 
further  confirm  our  above  derivations.  Note  that  implemen¬ 
tation  of  the  LMMSE  detector  needs  knowledge  of  matrix 
P  and  SNR  matrix,  but  the  adaptive  bootstrap  multiuser 
detector  developed  by  BarNess  et.  al.  [2]  can  achieve  the 
same  performance  as  that  of  the  LMMSE  detector  with¬ 
out  these  knowledge.  In  Figure  l(a)(b)(c),  we  diow  the 
performance  comparison  of  the  decorrelating  detector,  the 
LMMSE  detector,  and  the  adaptive  bootstr^  multiuser  de¬ 
tector.  We  also  plot  the  BPSK  limit  as  a  reference  lower 
bound  on  the  performance.  It  can  be  seen  that  there  is  an 
equivalence  between  the  LMMSE  detector  and  the  adaptive 
brotstr^  multiuser  detector.  Ihe  improved  performances  of 
the  LMMSE  and  the  adaptive  bootstrap  multiuser  detectors 
are  obtained  by  trading  a  little  bias  for  less  noise  variance, 
which  finally  results  in  an  overall  less  mean  squared  er¬ 
ror  (MSE).  And  the  adaptive  bootstrap  multiuser  detector 
provides  a  practical  implementation  of  the  LMMSE  detector 
fOT  CDMA  communication  iqiplications. 
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Hgure  1:  Performaace  of  the  LMMSE  detector  and  the 
adaptive  bootstrap  detector  in  multi-user  CDMA  system. 
Also  shown  in  figure  are  the  results  of  the  decorrdating 
detector  and  the  BPSK  limit.  Parameters  used:  SNRi  = 
SdB,K  =  3,M  =  1000000  independent  trials. 
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ABSTRACT 

We  previously  presented  [1]  a  blind  2D  RAKE  receiver  for 
CDMA  that  cancels  strong  multi-user  access  interference 
(MAI)  cund  optimally  combines  multipath.  After  passing 
the  output  of  each  antenna  through  a  matched  filter  based 
on  the  spreading  waveform  of  the  desired  user,  one  esti¬ 
mates  the  sign£d  plus  interference  spatio-frequency  correla¬ 
tion  matrix  during  that  portion  of  the  bit  interval  where 
the  fingers  of  the  RAKE  occur,  and  the  interference  alone 
spatic^frequency  correlation  matrix  during  that  portion  of 
the  bit  interval  away  from  the  fingers.  A  reduced  com¬ 
plexity  scheme  that  outperforms  the  previous  algorithm  in 
a  MAI  dominant  environment  is  presented  based  on  a  data 
adaptive  transformation  to  a  beamspace  of  dimension  equal 
to  the  effective  number  of  spatial  degrees  of  freedom  taken 
up  by  the  desired  user’s  multipath. 


1-  INTRODUCTION 

In  [l]  we  presented  a  blind  space- time  processing  scheme 
for  a  Direct  Sequence  Spread  Spectrum  based  CDMA 
PCS/cellular  communications  system  that  cancels  co¬ 
channel  interference  while  simultaneously  combining  mul¬ 
tipath  in  an  optimal  “RAKE-like”  fashion.  After  pcissing 
the  output  of  each  antenna  through  a  matched  filter  based 
on  the  spreading  waveform  of  the  desired  user,  one  esti¬ 
mates  the  signed  plus  interference  spatio-temporal  correla¬ 
tion  matrix  during  that  portion  of  the  bit  interval  where 
the  fcgers  of  the  RAKE  occur,  and  the  interference  edone 
spatic^temporal  correlation  matrix  during  that  portion  of 
the  bit^  interval  away  from  the  fingers.  It  was  shown  that 
the  weight  vector  yielding  the  optimum  signal  to  interfer¬ 
ence  plus  noise  ratio  for  bit  decisions  is  the  “largest”  gen¬ 
eralized  eigenvector  of  the  resulting  matrix  pencil. 

T^ng  a  cue  from  either  the  IS-95  standard  or  the  coarse 
acquisition  code  embedded  in  the  GPS  signal,  suppose  the 
chip  dmation  is  1  microsecond.  Experimental  measure¬ 
ments  in  an  urban  cellular  environment  reveal  that  the 
worst  case  time  delay  spread,  r^ox,  due  to  multipath  is 
on  the  order  of  10  microseconds  [2].  Sampling  two  times 
per  chip,  the  resulting  space- time  correlation  matrix  would 
be  of  dimension  20A^  x  207V,  where  N  is  the  number  of 
antenn^:  120  x  120  in  the  case  of  AT  =  6  antennas.  The 
large  dimensionality  of  the  spatio-temporal  correlation  ma¬ 
trix  pencil  prompted  an  investigation  into  a  frequency  do¬ 
main  implementation  of  a  RAKE  receiver  in  [1]. 

The  primary  cid vantage  of  a  frequency  domain  imple¬ 
mentation  of  a  RAKE  receiver  is  that  we  can  select  only 

I  This  research  was  supported  by  the  Air  Force  Office  of  Sci¬ 
entific  Research  under  grant  no.  F49620-95- 1-0367,  the  National 
Science  Foundation  under  under  grant  no.  MIPS-9320890,  and 
the  Army  Research  Office’s  Focused  Research  Initiative  under 
grant  number  DAAH04-95- 1-0246. 


frequency  values  within  the  mainlobe  of  the  spectrum  of 
the  autocorrelation  function  of  the  spreading  waveform  and 
the  number  of  such  values  required  for  “good”  performance 
can  be  substantiaJly  less  than  the  number  of  time  samples 
recorded  during  the  multipath  time  delay  spread.  The  re¬ 
sult  is  that  the  spatio-frequency  correlation  matrix  is  of 
significantly  smaller  dimension  th^m  the  space- time  correla¬ 
tion  matrix  with  no  degradation  in  performance.  Another 
possible  advantage  is  that  the  frequency  domain  implemen¬ 
tation  may  allow  for  a  lower  sampling  rate  than  space-time 
processing  since  we  can  allow  aliasing  in  the  sidelobes  of 
the  autocorrelation  function  of  the  spreading  waveform  as 
no  frequency  samples  from  there  are  used. 

Extensive  simulations  have  revealed  that  taking  roughly 
10  frequency  samples  equi-spaced  between  minus  half  the 
chip  rate  to  plus  half  the  chip  rate  provides  “good”  perfor¬ 
mance.  Thus,  the  spatio-frequency  correlation  matrices  are 
roughly  of  dimension  ION  x  107V  regardless  of  the  value  of 
Tmax^  60x60  in  the  case  of  TV  =  6  antennas.  The  motivation 
for  this  paper  is  twofold.  First,  the  dimension  of  the  spatio- 
frequency  snapshot  is  quite  large  for  a  reasonable  number 
of  antennas.  Second,  a  spatial  null  is  required  to  cancel 
each  strong  MAI  since  each  MAI  is  a  broadband  interferer. 
Thus,  the  number  of  MAI’s  that  can  be  canceled  is  limited 
by  the  number  of  antennas,  TV,  and  not  the  dimension  of  the 
spatio-frequency  snapshot  vectors,  107V.  This  paper  theo¬ 
retically  analyzes  a  reduced  complexity  scheme,  originally 
proposed  in  [3],  based  on  a  data  adaptive  transformation 
to  a  beamspace  of  dimension  equal  to  the  effective  number 
of  spatial  degrees  of  freedom  tciken  up  by  the  desired  user’s 
multipath.  Low  dimension  spatio-frequency  correlation  ma¬ 
trices  are  formed  in  the  reduced  dimension  beamspace.  We 
begin  the  development  with  the  space-time  data  model. 

2.  SPACE-TIME  DATA  MODEL 

The  TV  X  1  array  snapshot  vector  x(n)  contmning  the  out¬ 
puts  of  each  of  the  TV  antennas  comprising  the  cirray  at 
discrete  time  n  is  modeled  as 

K  Nt,-1 

<t)  =  ^  ^  Si{ei)D{u)c{t  -  nTt  -  n) 

*=1  n=0 

J 

+  ^  a(^i)<T.I>i(n)ci(t- nT6)  +  n„(tXl) 

«=!  n=r0 

where  a(^)  is  the  spatied  response  of  the  array.  For  the  sake 
of  notational  simplicity,  we  here  cissume  that  the  spatial  re¬ 
sponse  vector  depends  on  a  single  directional  parameter,  0^ 
the  direction  of  arrival  (DOA)  of  a  given  source.  However, 
no  model  is  assumed  for  a(^)  in  the  algorithm  to  be  pre¬ 
sented;  the  algorithm  works  for  any  array  geometry.  1/Tb  is 
the  symbol  rate.  K  is  the  number  of  different  paths  the  Sig¬ 
nal  of  Interest  (SOI)  arrives  from,  0^  denotes  the  directions 
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associated  with  the  k-th  path,  and  Tk  is  the  corresponding 
relative  delay  of  the  k-th  path,  pk  is  the  complex  amplitude 
of  the  k-ih  multipath  arrival  for  the  SOI  at  the  reference  ele¬ 
ment.  D(n)  and  Di{n)  are  the  digital  information  sequences 
for  the  SOI  and  MAI  sources,  respectively.  J  broadband  in- 
terferers  (MAI)  impinge  upon  the  array.  is  the  complex 
amplitude  of  the  i-th  interferer  at  the  reference  element  of 
the  array.  c{t)  and  Ci{t)  are  the  spreading  waveforms  for 
the  SOI  and  t-th  MAI,  respectively.  The  vector  nu;(t)  con¬ 
tains  white  noise.  Nt  is  the  number  of  bits  over  which  all 
parameters  characterizing  the  model  in  (1)  are  assumed  to 
be  constant.  Nb  might  be  quite  small  in  cases  of  rapidly 
evolving  dynamics. 

The  spre£wiing  waveform  for  the  i-th  MAI  is  modeled  as 

Nc-l 

Ci{t)  =  ^2  (^) 

m=0 

where  1/Tc  is  the  chip  rate,  d,(n)  is  a  pseudo-noise  (PN) 
sequence  \  pc{t)  is  the  chip  waveform  assumed  common  to 
all  sources,  Nc  is  the  number  of  chips  per  bit  common  to 
all  MAPs.  The  spreeiding  waveform  for  the  desired  source, 
c(tb  is  defined  similarly  but  with  a  different  PN  sequence. 

The  received  signal  at  each  antenna  is  sampled  at  a  rate 
fs  —  Lc/Tci  where  Lc  is  the  number  of  samples  per  chip. 
The  sampled  output  of  each  antenna  is  passed  through  a 
filter  with  impulse  response  h[n]  =  c[‘-n],  where  c[n]  — 
c{nTclLc)^  Let  XpCn)  denote  the  N  x  NcLc  matrix  whose 
j-th  row  contains  Ns  =  NcLc  samples  of  the  output  of  the 
j-th  antenna  after  the  matched  filter  for  the  n-th  (i.e.,  one) 
bit  period.  Given  the  model  for  x(t)  in  (1),  Xf  may  be 
expressed  as 

XF(n)  =  D{n)AT  -f  A/S7B(n)P  +  N(n)  (3) 

A  is  an  N  K  K  matrix  whose  K  columns  are  are  a(^fc), 
Jk  =  1,  •  •  • ,  /r.  T  is  the  AT  X  iV,  (recall  Ns  -  NcLc)  matrix 
given  by 

/  rcc(— Ti)  *'cc  (x^-n)  •••  >"cc  ((A^5-1)x^“Ti) 

^  I  rcc(  — “Tz)  Tec  ((Na-l)x^-'3^2) 


\  Tec  (-tk)  rcc  (x^’Tk)  ■  •  •  Tec  ((^5-1)x^-tk) 

(4) 

where  rcc(T)  is  the  autocorrelation  function  for  the  SOPs 
spreading  waveform,  c(t).  In  the  case  where  the  chip  wave¬ 
form,  peft),  is  rectangular  and  the  processing  gain  is  large, 
rcc(T)  has  the  following  triangular  shape: 


/  ^  /  1  -  ^  if  If|  <  ' 

»-cc(t)  -  I  '  if  |t|  >  ' 


53/  is  a  7  X  J  diagonal  matrix  containing  the  amplitudes 
(at  the  reference  element),  <Tj,  t  =  1,...,  J,  for  each  of  the 
MAI’s.  B(n)  is  a  J  X  J  diagonal  matrix  containing  the  bit 
values  for  the  J  MAPs.  The  columns  of  the  N  x  J  matrix 
Ai  are  a(^i)»  *  =  1,  •  *  * ,  7.  P  is  the  7  x  TV,  matrix 

/  ci[n]  ♦/»[«]  \ 

P  =  :  (6) 

\  cj[n]  +  h[n]  / 

where  c,[n]  =  Ci(nTc/Lc).  where  ♦  is  the  linear  convolution 
operator  truncated  at  Ns  samples.  N(n)  is  the  noise  contri¬ 
bution;  the  rows  of  N(n)  are  independent  Gaussian  process 
but  the  individual  components  of  each  row  are  correlated 
because  of  the  matched  filtering  operation. 

^Without  loss  of  generality,  real- valued  spreading  waveforms 
have  been  assumed  for  notational  simplicity. 


3.  SPACE-FREQUENCY  DATA  MODEL 

Define  the  Ns  x  Nw  selection  matrix  F*  as 


INvjXNw 

0(iV,-i-K«,  +  l)xNu 


(7) 


wiicic;  J. 

LcTmax  /q\ 

Nu,  ==  —Tj; —  (S) 

where  Tmax  is  the  worst  case  time  delay  spread  due  to  multi- 
path.  Without  loss  of  generality,  we  have  chosen  Tmax  such 
that 


is  an  integer.  The  important  thing  is  that  Tmax  be  an  upper 
bound  on  the  experimentally  measured  worst  case  multi- 
path  time  delay.  If  Pc(0  is  one  microsecond  in  duration,  a 
reasonable  number  for  Nm  is  10  in  an  inban  cellular  envi¬ 
ronment. 

Define  the  t-th  spatio-frequency  snapshot  for  the  n-th  bit 

yb)(n)  =  vec  (XF(n  )r(0w)  (10) 

The  A/^u;  X  L  matrix  W  is  composed  of  L  <  Nw  columns  of 
the  Nw  point  DFT  matrix;  the  £-th  column  of  W  has  the 
form  ^ 

w,=  .  (11) 


Note  that  the  analog  frequency  separation  between  adjacent 
spectral  lines  is 


Finally,  vec(-)  is  the  operator  that  maps  an  N  X  L  matrix  to 
an  ATL  X  1  vector  by  concatenating  its  columns.  Summariz¬ 
ing  the  steps  implied  by  (10),  the  procedure  is  to  compute 
L  spectral  lines  of  the  rows  of  Xf(^*)  over  Nw  ==  NmLc 
samples  starting  at  the  *-th  column,  and  then  stack  the 
resulting  L  N  xl  vectors  in  an  NL  x  1  vector. 

Let’s  examine  the  structure  of  the  NL  X  1  spatio- 
frequency  snapshot  vector  formed  by  substituting  the  fom 
of  Xf(«)  in  (3)  into  (10).  Assuming  the  processing  gain, 
i.e.,  the  number  of  chips  per  bit,  Ncj  to  be  large,  and  that  we 
have  approximate  bit  synchronization  for  the  desired  user, 
the  term  corresponding  to  the  SOI  is  only  nonzero  dunng 
the  first  Nm  chips  associated  with  index  t  =  1.  Assume 
rcc(T)  to  have  the  triangular  shaped  described  by  (5)  cor¬ 
responding  to  a  rectangular  chip  waveform.  In  this  case, 
each  row  of  T  is  simply  a  sampled  version  of  a  time-delayed 
replica  of  (5),  delayed  by  Ti  where  i  is  the  row  index.  Using 
simple  properties  of  the  Fourier  Transform  it  follows  that 

y,(n)  =  vec  (D(«)ATr'’^ W)  =  £>(n)vec(A$F)  (13) 

where  F  is  an  L  x  L  diagoncil  matrix:  the  i-th  diagonal 
entry  of  F  is  of  the  form 


T(0  = 


The  i-th  column  of  the  J  x  L  matrix  ^  is  of  the  form 


4>i  =  [« 


Tmax  ^  Tma 


— j2jri*r— ' 

•  ,  e 


max  I  (15) 


It  is  instructive  to  examine  the  asymptotic  structure  of 
the  LN  X  LN  spatio-frequency  correlation  matrix  of  Ysin): 
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18  A/(n),  it  follows  that 


Since  the  only  time  varying  quantity 


Ks  =  aidd^,  where:  d  =  (F  (g)  ASs)vec($)  (16) 

where  al  =  E[D^{n)\^  S5  \s  ^  K  x  K  diagonal  matrix 
containing  the  complex  amplitudes  of  the  K  multipaths  for 
the  SOI.  The  result  in  (16)  was  obtained  by  invoking  the 

property  vec(ApB)  =  (B^  (g)  A)vec(D).  Note  that  Ks  is 
a  rcuik  one  matrix. 

The  energy  contribution  of  the  MAPs  to  a  given  bit  pe¬ 
riod  is  approximately  evenly  spread  across  the  entire  bit 
period.  Define 

=  vec  (A/E/B(fi)Pr^'>W)  (17) 

It  can  be  shown  that  the  MAI  spatio-frequency  correlation 
matrix  K/  =  i^[y/ ^(n)y/ ^^(n)]  may  be  expressed  as 

Ki  =  E[vec(A/E/B(n)Pr(*^W)vec^(A/E/B(n)Pr(‘)W)] 

=  RF(S)A/E?Af  (18) 

where  we  have  exploited  the  following  four  properties:  (i) 
vec(ADB)  =  (B^  (g>  A)vec(D),  (ii)  (A  (g)  B)(C  0  D)  = 
rAC)0(Bp),  and  (iii)  £;{B(n)B"(n)}  =  (the 

data  from  different  sources  are  assumed  to  be  uncorre¬ 
lated,)  ^d  (iv)  E[di[m)dj{i)]  =  SijSmi^  the  chip  values 
comprising  each  PN  sequence  are  modeled  as  independent 
and  identicaUy  distributed.  =  W^T/W*,  where 

T/  is  a  Toeplitz-symmetric  matrix  whose  first  column  is 
r^cfniTc^Lc),  m  =  0,  —  1.  Note  that  Rp  is  full  rank 

and  A/  is  rank  J.  It  follows  that  K/  is  rank  LJ. 

Regarding  the  noise  contribution,  a  similar  development 
reveals  that  the  spatio-frequency  correlation  matrix  for 
yjj^(n)  =  vec  (N(n)r^*^W)  hcis  the  cisymptotic  form 

Kjv  =  ^*^01^  (19) 

where  ^  is  is  an  L  x  L  matrix.  ^  may  be  expressed  as 
=  W^Tiv^W*,  where  Tjv  is  a  Toeplitz-symmetric  ma¬ 
trix  whose  first  column  is  rcc{mTc/Lc)j  m  =  0, -  1. 

4.  BLIND  ADAPTIVE  2D  RAKE  RECEIVER 

The  signal  plus  interference  plus  noise  spatio-frequency  cor¬ 
relation  matrix  is  estimated  as 

Ks+/+jv  =  (20) 

n=0 

Observe  that  only  spatio-frequency  snapshot  is  extracted 
from  each  bit  in  forming  Ks+/+jv.  The  interference  plus 
noise  spatio-frequency  correlation  matrix  is  estimated  as 


to  the  solution  to  the  Minimum  Variance  Distortionless  Re¬ 
sponse  (MVDR)  problem 

min  w^K/+jvw 

w 

subject  to:  w^d  =  1  (22) 

where  d  is  the  NL  x  1  vector  defined  in  (16). 

The  ” largest”  generalized  eigenvector  of  the  pencil 
{Ks+z+iy^,  K/+Ar}  is  the  solution  to  the  unconstrained  op¬ 
timization  problem 


(23) 

w  w^K/+jvw  w  w^K/+jvw  '  ' 


where  we  have  invoked  the  fact  that  Ks  is  rank  one.  It 
follows  that 


w^cr^dd^w 

w^K/+^fW 


- >  =  mm 

yw  J  w 


w"K/+Arw 

w^iridd^w 


A^6(iV.-l) 


E  E 


In  the  development  below  we  show  that  the  optimum  set 
of  spatio-frequency  weights  for  weighting  and  summing  the 
frequency  samples  computed  in  the  vicinity  of  the  fingers 
of  the  RAKE  is  the  ” largest”  generalized  eigenvector  of  the 
matrix  pencil  {Ks+/+;v,  K/+jv}. 

We  here  show  that  the  asymptotic  spatio-frequency 
weight  vector  w  obtained  as  the  ”  largest”  generalized  eigen¬ 
vector  of  the  matrix  pencil  {Ks+/+n,  K/+j\r}  corresponds 


]  which  is  equivalent  to  the  constrained  minimization  prob- 
)  lem  in  (22).  The  “inversion”  of  the  maximization  problem 
resulting  in  the  equivalent  minimization  problem  is  possible 
since  K/+jv  is  a  positive-definite  matrix. 

We  now  prove  each  sub- vector  of  the  optimum  weight 
vector  corresponding  to  a  particular  frequency  bin  is  or¬ 
thogonal  to  each  column  of  A/.  Equation  (18)  reveds  that 
the  spatio-frequency  correlation  matrix  of  the  MAI  is  a  Kro- 
necker  product  of  the  L  X  L  matrix  Rf  with  the  AT  x  TV  ma¬ 
trix  A/E/A/  .  Let  Ef  be  an  L  x  L  matrix  whose  columns 
are  the  eigenvectors  of  Rf.  Since  Rf  is  a  full  rank  Hermi- 
tian  matrix,  it  follows  that  EfEf  =  EfEf  =  h-  How¬ 
ever,  A/E/Af  is  only  rank  J.  Let  Let  Es  be  an  AT  x  J 
matrix  whose  columns  are  the  eigenvectors  of  A/E/A^  as¬ 
sociated  with  the  J  nonzero  eigen vcJues.  From  signal  sub¬ 
space  theory,  it  follows  that  Es  =  A/T  where  T  is  a  J  x  J 
full  rank  matrix.  It  was  noted  previously  that  Kj  is  of  rank 
LJ .  Let  Ej  be  an  LN  x  LJ  matrix  whose  columns  are  the 
eigenvectors  of  K/  associated  with  the  LJ  nonzero  eigen¬ 
values.  It  follows  from  the  theory  of  Kronecker  products, 
that  E/  =  Ef  ®  Es. 

Post-multiplying  E/  by  a  full  rank  LJ  x  LJ  matrix  yields 
a  matrix  whose  range  space  is  the  same  as  that  of  E/.  Post- 
multiplying  E/  by  the  full  rank  matrix  E^  0  yields 

G/  =  E/(E^(giT-‘)  =  (E;'®Es)(E?®T-*X25) 

=  EfEp  ®  EsT“*  =  It  ®  A/  (26) 

where  we  have  used  the  Kronecker  product  property  fA0 
B)XC0D)  =  (AC)0(BDL  f  f  V 

The  projection  operator  P/  onto  the  range  space  of  K/ 
may  be  expressed  as 

P/ =  E/E/^  =  G/(G/^G/)~^G/^  =  It  0  (27) 

where  Vaj  is  the  projection  operator  {N  x  N)  into  the 
subspace  generated  by  the  columns  of  A/. 

The  optimum  weight  vector  is  given  by  the  Wiener  so¬ 
lution  as  =  Kj^Tv^d.  When  the  power  of  the  MAPs 
is  much  stronger  than  than  the  receiver  generated  noise, 
the  optimum  weight  vector  is  well  approximated  by  « 
ri  -  P/d.  Substituting  the  expression  for  P/  in  (27)  in 
d  —  P/d  yields  the  following  asymptotic  expression  for  the 
optimum  weight  vector: 

/  di-P^/di  \ 


dL-^Aidi 
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where  d<  is  the  i-th  N  xl  sub-vector  of  the  NL  x  1  vector 
d  defined  in  (16).  It  follows  that  the  i-th  N  xl  sub-vector 
of  is  orthogon£d  to  each  column  of  A/,  i.e.,  the  array 
pattern  obtain  at  each  frequency  bin  exhibits  a  spati£d  null 
in  the  direction  of  each  and  every  MAI. 

5.  BLIND  SPATIAL  PRE-PROCESSING 

The  spatio-frequency  MVDR  processor  should  accomplish 
two  tasks:  (i)  it  should  null  the  MAI’s  and  (ii)  it  should  op¬ 
timally  combine  the  fingers  corresponding  to  the  different 
paths  of  the  SOI.  The  optimum  weight  vector  is  the  largest 
“generalized”  eigenvector  of  an  NL  x  NL  spatio-frequency 
correlation  matrix  pencil.  This  large  dimensionality  trans¬ 
lates  into  a  corresponding  large  computation2il  burden,  de¬ 
tracting  from  the  real-time  applicability  of  the  scheme  and 
slowing  up  the  time  to  convergence  as  well. 

We  here  restrict  our  attention  to  a  sceneirio  where  the 
MAI’s  are  the  primary  source  of  interference.  Since  cancel¬ 
ing  each  broadbauid  MAI  consumes  L  degrees  of  freedom 
-  so  that  J  spatial  nulls  are  formed  towards  the  MAI’s  at 
each  of  the  L  frequency  bins  -  the  interference  rejection  ca¬ 
pability  of  the  algorithm  is  not  diminished  if  it  is  divided 
into  two  stages:  spatial-only  pre-processing  with  N  degrees 
of  freedom  to  cancel  the  MAI’s  followed  by  blind  MVDR 
spatio-frequency  in  the  reduced  dimension  beamspace. 

In  this  scheme,  we  first  transform  to  a  p  <  N  dimen- 
sionad  beamspace  using  the  “Wgest”  generalized  eigen¬ 
vectors  of  the  N  X  N  spatial  correlation  matrix  pencil 
{Rs+/+;v,  R/+J^}-  Here  p  is  the  number  of  dominant  mul¬ 
tipath  for  the  SOI  that  are  resolvable  in  space.  Rs+z+n 
is  formed  from  snapshots  measured  in  the  vicinity  of  the 
“fingers,”  while  R/+/v  is  formed  from  snapshots  measured 
away  from  the  “fingers.” 


6.  SIMULATION 

A  simulations  was  conducted  employing  a  six  element 
uniformly-spaced  line£U*  array  with  half- wavelength  spacing. 
Both  the  desired  source  and  the  interferer  were  DS-CDMA 
signals  with  Afferent  Gold  codes  and  127  chips  per  bit;  the 
duration  of  a  chip  was  one  microsecond.  The  modulation 
overlay  was  BPSK.  A  simple  two-ray  multipath  model  was 
used  for  the  desired  source  wherein  the  direct  path  ^ived 
at  an  elevation  angle  of  0®  relative  to  broadside  with  an 
SNR  of  10  dB  per  element.  The  second  ray  arrived  at  an 
angle  of  10®  with  a  relative  delay  of  2  chips  and  an  SNR  6 
dB  below  that  of  the  direct  path  and  phase  shifted  by  45® 
at  the  array  center.  The  interferer  was  modeled  as  arriving 
at  a  single  discrete  angle,  30®  elevation,  with  an  SNR  of  30 
dB  per  element.  There  were  two  samples  per  chip. 

The  beam  pattern  obtained  with  the  weight  vectors  of  the 
first  stage  computed  as  the  “  two  largest”  generalized  eigen¬ 
vectors  of  the  matrix  pencil  {Rs+z+isr,  R/+n}  are  plotted 
in  Figure  1.  Both  patterns  are  observed  to  peak  near  the  re¬ 
spective  angular  directions  of  the  multipath  arrivals  for  the 
desired  user,  and  have  a  deep  null  in  the  direction  of  the 
interferer.  The  two  “largest”  generalized  eigenvectors  were 
employed  to  transform  to  a  2-dimensional  beamspace.  Ap¬ 
plying  the  spatio-frequency  processing  scheme  to  the  out¬ 
puts  of  the  p  =  2  respective  beams  yields  the  signal  con¬ 
stellation  plotted  in  Figiire  2. 

Comparing  the  computationcJ  load  of  this  two-stage  pro¬ 
cedure  with  spatio-frequency  processing  in  the  original  el¬ 
ement  space,  the  latter  requires  the  computation  of  the 
“largest”  generalized  eigenvector  of  a  60  x  60  matrix  pen¬ 
cil.  In  contrast,  the  former  requires  computation  of  the  two 
“largest”  generalized  eigenvectors  of  a  6  x  6  spatial  matrix 
pencil  followed  by  the  computation  of  the  “largest”  gener¬ 
alized  eigenvector  of  a  20  x  20  spatio-frequency  correlation 


matrix  pencil  in  beamspace.  Moreover,  relative  to  perfor¬ 
mance,  the  two-stage  procedure  offers  faster  convergence  as 
there  are  mcuiy  more  spatial  snapshots  per  bit  for  forming 
Rs+j+iv  and  R/+n  than  spatio-frequency  snapshots  per  bit 
for  forming  Ks+z+n  and  K/+iv.  Coupled  with  the  fact  that 
Rs+z+n  and  Rz+at,  as  well  as /Cs+z+n  and^^/+Ar,  are  of 
much  smailler  dimension  th^ul  Ks+z+zsr  cind  Kj+tv^,  there  is 
a  substantial  decrease  in  the  error  of  the  estimate  of  the  op¬ 
timum  space-  frequency  weight  vector  in  beamspace  relative 
to  that  in  element  space. 
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Abstract 

The  use  of  array  antennas  in  the  radio  nodes  of  a  wireless 
network  will  increase  the  capacity  since  by  appropriate  null 
placement,  the  cochannel  interference  can  be  suppressed 
and  the  channel  reuse  will  be  enhanced.  A  radio  network 
with  arbitrary  topology  and  array  antennas  in  the  nodes  is 
considered  in  this  paper.  Each  radio  node  may  act  either 
as  a  receiver  or  as  a  transmitter  and  can  place  a  number  of 
nulls  in  its  radiation  pattern.  A  collection  of  links  may  share 
the  same  channel  if  the  interfering  transmitters  within  the 
range  of  each  receiver  are  cancelled  out  by  null  placement 
either  at  the  receiver  or  at  the  transmitter  side.  The  fol¬ 
lowing  problem  is  considered.  Given  a  collection  of  links, 
identify  a  null  placement  configuration  such  that  the  links 
share  the  same  channel  without  interference.  It  is  shown  that 
such  a  null  placement  configuration  can  be  found  by  solv¬ 
ing  a  maximum  flow  problem  in  an  appropriately  defined 
capacitated  network. 


1.  Introduction 

Array  antennas  with  linear  combining  of  the  element  out¬ 
puts  have  been  studied  extensively  over  the  past  decades. 
By  controlling  the  weights  in  the  linear  combination  of 
the  element  outputs,  the  radiation  pattern  of  the  antenna 
can  be  designed  to  have  certain  desirable  characteristics 
like  a  high  gain  narrow  beam  in  the  direction  of  the  in¬ 
tended  communicator  and  nulls  in  the  directions  of  jam¬ 
mers/interceptors.  Efficient  signal  processing  algorithms 
for  direction  of  arrival  estimation,  adaptive  beam  and  null 
steering  and  multiple  source-location  estimation  have  been 
proposed  [11,  10,  1,  2].  Most  of  the  existing  work  on  the 
subject  is  for  the  case  of  a  single  node  employing  an  array 
antenna  which  communicates  with  another  transceiver  in  the 
presence  of  multiple  interferers. 


It  is  apparent  that  the  capability  of  an  array  antenna  to 
shape  its  radiation  pattern  can  be  utilized  in  a  wireless  net¬ 
working  environment  with  multiple  interacting  radio  links. 
The  beamforming  capabilities  of  an  array  antenna  can  be 
used  both  in  the  receiver  and  the  transmitter  end  of  a  wire¬ 
less  link  to  improve  the  quality.  The  receiver  antenna  places 
the  main  lobe  towards  the  direction  of  its  transmitter  in  or¬ 
der  to  increase  the  link  gain,  while  the  reception  nulls  are 
placed  towards  the  directions  of  interfering  transmitters  in 
order  to  reduce  interference.  Similarly  the  transmitter  places 
its  main  lobe  towards  the  direction  of  its  receiver  while  the 
nulls  should  be  placed  such  that  the  receivers  of  other  links 
with  high  interference  levels  are  not  affected.  In  this  way 
the  signal  to  interference  ratio  of  cochannel  links  can  be 
enhanced  and  a  large  number  of  cochannel  connections  can 
be  accommodated  in  each  channel.  Techniques  that  utilize 
the  beamforming  capabilities  of  array  antennas  to  increase 
the  capacity  of  wireless  networks  are  currently  under  inves¬ 
tigation  by  several  researchers  and  they  are  termed  Spatial 
Division  Multiple  Access  (SDMA).  In  this  paper  we  con¬ 
sider  the  problem  of  increasing  network  capacity  by  using 
the  null  placement  capability  of  array  antennas,  to  facilitate 
the  coexistence  of  large  numbers  of  cochannel  links. 

2  Radio  Network  Model 

Consider  N  radio  nodes  with  one  transceiver  per  node. 
The  antenna  of  node  v  has  elements.  Neighboring  trans¬ 
missions  using  the  same  channel  interfere  unless  the  trans¬ 
mitters  and  receivers  place  the  nulls  of  their  radiation  pat¬ 
terns  appropriately.  We  assume  that  the  transmission  of 
node  V  interferes  the  reception  of  node  w  from  node  k  if  w 
is  in  the  neighborhood  of  v,  and  neither  v  nor  w  place  a  null 
in  the  direction  from  v  to  w.  The  proximity  among  the  nodes 
and  its  impact  to  the  signal  and  interference  levels  among 
them  is  captured  by  the  connectivity  graph  G  =  (X,  E) 
where  X  is  the  set  of  radio  nodes  and  edge  (tr,  w)  belongs 
to  E  if  and  only  if  nodes  v,  w  are  within  range  one  from 
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the  other.  Hence  edge  (t;,  w)  implies  both  that  the  nodes 
can  talk  among  themselves  as  well  as  that  they  interfere  one 
another.  A  communication  link  between  nodes  v  and  w  can 
be  established  only  if  [v,  w)  belongs  to  E.  Furthermore 
the  reception  at  some  node  v  from  some  node  w  is  inter¬ 
fered  by  the  cochannel  transmission  of  some  other  node  u, 
if  {u,  v)  belongs  to  E  and  neither  v  nor  u  place  a  reception 
or  transmission  null  respectively  towards  each  other. 

In  multihop  radio  networks,  the  same  channel  is  reused  by 
multiple  communication  links  in  order  to  increase  the  traffic 
capacity.  Assume  that  the  nodes  employ  omnidirectional 
antennas.  Then  a  set  of  links  5  =  { (ui,u;i 
can  use  the  same  channel  if  and  only  if  the  receiver  Wi 
of  each  link  i  is  within  the  transmission  range  of  its  own 
transmitter  Vi  and  out  of  range  of  any  other  transmitter  in 
the  set  S.  That  is,  {vi,Wi)  e  E  for  i  =  and 

ivj,u)i)  ^  EforaWiJ  =  ij^j.  For  increasing 

network  capacity,  it  is  desirable  to  have  sets  of  cochannel 
links  with  large  cardinality.  The  larger  the  cochannel  sets 
are,  the  larger  the  number  of  simultaneous  transmissions 
that  can  be  accommodated  with  a  fixed  number  of  channels. 
The  problem  of  link  scheduling  to  alleviate  interference  and 
increase  the  capacity  of  packet  radio  networks  was  studied 
extensively  in  the  past  [5,  3, 4,  9,  6,  8]. 

If  the  nodes  possess  array  antennas,  then  the  neighboring 
links  can  share  the  same  channel  since  cochannel  interfer¬ 
ence  can  be  alleviated  by  null  placement.  Therefore,  larger 
sets  of  links  can  share  the  same  channel  and  the  network 
capacity  may  increase. 

3  Null  Placement 

Node  V  has  an  array  with  Cu  —  1  elements  and  therefore 
can  place  e„  -  1  nulls,  either  when  it  acts  as  a  receiver  or  as 
a  transmitter.  Since  a  node  v  can  interfere  or  be  interfered 
by  other  nodes  which  are  within  its  transmission  range,  the 
nulls  are  placed  towards  those  directions.  Let’s  denote  by 
N{v)  the  set  of  nodes  w  which  are  within  range  of  node 
V,  that  is  (u,  w)  6  E.  Node  v  places  nulls  towards  the 
directions  of  at  most  -  1  nodes  of  those  in  N{v).  The 
null  placement  of  node  v  is  described  by  the  set  U  (v)  of  the 
nodes  which  are  nulled. 

Consider  a  set  of  links  S  =  {(ui,u;i), ...,  {um,w’m)} 
where  the  nodes  u,  act  as  transmitters  and  the  nodes  Wi 
act  as  receivers.  Let  V  =  {vi  :  i  =  I,  ...,M}  and  W  = 
{tui :  i  =  1, ...,  M}  be  the  sets  of  transmitters  and  receivers 
respectively.  The  links  in  S  may  constitute  a  cochannel  set 
if  and  only  if  for  any  two  nodes  Vi  6  V,  wj  G:  W,i  ^  j 
which  are  within  the  transmission  range  one  of  the  other 
{{vi,Wj)  e  E),  either  Vi  or  wj  place  a  null  towards  the 
other.  The  beam  pattern  for  the  set  of  links  U  is  specified 
by  the  collection  U  =  {U{v),v  G  U  FT},  that  is  called 
null  placement  configuration  in  the  following.  Hence,  a  set 


of  links  is  an  eligible  cochannel  set  under  a  null  placement 
configuration  U  if  and  only  if 

A.  Nodes  Vi,  Wi  have  their  main  beams  directed  towards 
each  other,  therefore  Vi^U (wi),  Wi  ^  U (vi). 

B.  When  {vi,Wj)  belongs  to  E,i^  j,  then  either  Wj  G 
U{vi)  of  Ui  G  U{wj).  (Interference  cancelation) 

In  determining  cochannel  sets  with  large  cardinality,  the 
next  problem  becomes  of  interest: 

P:  Given  a  set  of  links  S  and  array  antennas  with  a  certain 
number  of  elements  at  each  node,  find  a  null  place¬ 
ment  configuration  under  which  the  set  of  links  S  is 
an  eligible  cochannel  set. 

Let’s  partition  the  collection  of  sets  U  representing  the 
null  placement  configuration  for  the  set  of  links  S ,  in  two 
subcollections  Uv  and  Uw',  Uv  —  {U{vi)  :  i  =  1,  is 

the  null  placement  of  the  transmitters  and  Uw  =  {U{wi)  : 
i=  I, ...,  M},  is  the  null  placement  of  the  receivers.  Note 
that  for  a  specific  null  placement  configuration  Uv  in  the 
transmitters,  it  is  simple  to  determine  whether  there  is  a 
receiver  null  placement  configuration  Uw  such  that  S  is 
an  eligible  cochannel  set  for  the  configuration  U  =  U„\J 
Uw-  It  is  enough  that  for  every  node  wu  the  number  of 
transmitters  Vj,i^  j  which  are  within  range  of  Wi  and  have 
not  placed  a  null  towards  the  direction  of  Wi  is  less  than  or 
equal  to  e,,,^  -  1.  In  this  case  the  receiver  Wi  may  cancel 
the  interfering  nodes  by  placing  its  own  nulls  towards  their 
directions.  Hence  problem  P  can  be  rephrased  as  follows; 

PI:  Find  a  transmitter  null  placement  configuration  Uv 
such  that  for  each  receiver  Wi  G  W,  the  number  of 
transmitters  in  V  still  interfering  Wi  after  the  null 
placement  Uv  is  less  than  or  equal  to  Cwi  —  1  • 

In  the  following  we  show  that  a  feasible  null  placement 
configuration  can  be  obtained  by  the  solution  of  a  maxflow 
problem. 

Consider  a  directed  bipartite  graph  Gb  with  sets  of  nodes 
V  and  W  and  set  of  edges  Eb  that  consists  of  all  the  edges 
of  G  connecting  nodes  between  V  and  W  with  an  imposed 
direction  from  V  to  W.  The  edges  {vi,Wi),i  = 
are  excluded  from  Eb-  Construct  a  flow  network  Gp  by 
augmenting  Gb  as  follows.  Augment  the  set  of  nodes  V  U 
W  by  two  additional  nodes  s  (source)  and  d  (destination). 
Augment  the  set  of  links  by  jVI  links  directed  from  s  to 
each  one  of  the  nodes  in  V  and  |VF|  links  directed  from 
each  node  in  W  to  d.  Define  a  capacity  function  on  the  set 
of  links  as  follows.  Each  link  in  Eb  has  capacity  equal  to 
one.  A  link  from  s  to  node  Vi  eV  has  capacity  Csvi  equal 
to  {d{vi)  -  tvi  +  I)'*’,  where  d{vi)  is  the  number  of  links 
emanating  from  node  Vi  in  the  graph  Gb  .  and  a  link  from  a 
node  tui  G  W  to  d  has  capacity  eu;;  -  1. 
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The  null  placement  problem  PI  is  equivalent  to  an  integer 
maximum  flow  problem  in  as  it  is  argued  in  the  follow¬ 
ing.  Consider  feasible  integer  flow  vectors  in  G/r,  that  is 
nonnegative  integer  vectors  /,  with  one  component  for 
each  link  (t;,  w),  which  satisfy  the  link  capacity  constraints 
and  the  flow  conservation  equations.  The  flow  transfer  of 
a  flow  vector  is  the  sum  of  the  flows  of  all  links  emanating 
from  node  s.  The  maxflow  problem  is  to  identify  the  flow 
vector  with  the  maximum  flow  transfer.  For  more  details 
on  the  maxflow  problem,  the  reader  is  referred  to  [7].  The 
problem  PI  and  the  maxflow  problem  in  Gf  are  equivalent 
in  the  following  sense. 

There  is  a  null  placement  configuration  for  which  S  is 
feasible  cochannel  set  if  and  only  if  the  maximum  flow  trans¬ 
fer  in  Gf  is  equal  to  G^^. . 

This  claim  is  justified  in  the  following.  Consider  a  flow 
vector  /°  that  achieves  the  maximum  flow  transfer.  For  that 
flow  vector,  the  flow  through  each  link  {s,Vi)  will  be  equal 
to  its  capacity,  f^^.  =  Csv^  •  Note  that  the  flow  of  each 
link  in  Eb  will  be  equal  to  1  or  0  and  because  of  the  flow 
conservation  equation  for  exactly  links  emanating 
from  node  Vi  will  have  flow  equal  to  1  and  the  rest  equal  to  0. 
That  is,  exactly  {d(vi)  —  e^.  H-  1)"^  links  emanating  from 
have  flow  equal  to  1 .  For  each  node  w,  there  is  a  number  of 
links  with  capacity  equal  to  1  terminating  in  Wi  and  only  one 
link  with  capacity  —  1  originating  from  Wi  to  d.  Because 
of  the  flow  conservation  equations  in  node  wt,  at  most  e^; .  - 1 
links  with  flow  equal  to  1  may  terminate  in  Wi  from  some 
node  in  V .  Consider  a  null  placement  configuration  where 
each  transmitter  sets  the  nulls  towards  the  directions  of  the 
links  that  carry  zero  flow  and  each  receiver  Wi  sets  the  nulls 
towards  the  directions  of  the  links  carrying  flow  equal  to  1. 
This  null  placement  is  feasible  since  each  node  vj  (wi)  needs 
to  place  only  up  to  —  1  (e^j.  —  1)  nulls.  Furthermore  for 
each  interfering  pair  {vj.Wi)  G  Eb,  j  ^  i,  the  interference 
is  cancelled  either  by  a  null  from  vj  if  Fy^yj.  =  0  or  by  a 
null  from  Wi  if  Fy^yj.  =  1.  Hence  it  is  shown  that  given  a 
feasible  flow  vector  with  flow  transfer  equal  to  Csvj , 

a  null  placement  configuration  U  for  which  S  is  an  eligible 
cochannel  set  can  be  obtained.  It  can  be  argued  similarly 
that  if  there  exist  a  null  placement  configuration  for  which  S 
is  a  feasible  cochannel  set,  then  the  maximum  flow  transfer 
in  Gf  is  equal  to  • 

4  Discussion 

A  model  for  a  radio  network  with  array  antennas  was 
considered  and  it  was  shown  how  the  radiation  nulls  should 
be  placed  such  that  a  certain  collection  of  links  to  constitute 
an  eligible  (interference  free)  cochannel  set. 

There  are  several  directions  for  further  investigation,  in¬ 
cluding:  more  accurate  modeling  of  the  interference  using 
path  losses  and  exact  values  of  the  antenna  radiation  gain 


towards  each  direction;  adaptive  null  placement  without  the 
knowledge  of  the  interferers  directions;  consideration  of  the 
effect  of  the  signaling  schemes  on  the  beamforming.  Most 
of  these  problems  have  already  been  considered  for  a  single 
receiver  and  it  remains  to  be  addressed  in  a  network  context. 
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Abstract 

A  multiple  antenna  diversity  scheme  is  investigated  for 
digital  wireless  communications.  Antenna  observations  are 
immediately  quantized  and  sent  to  a  fusion  center  to  de¬ 
cide  which  symbol  was  transmitted.  The  optimum  reception 

schemeisdescribedforthecasewherefrequency  shiftkeying 

is  employed  and  where  slow  Rayleighfading  and  Gaussian 
additive  noise  are  present.  Two  cases  are  studied.  In  the 
first  case  an  accurate  estimate  of  the  signal-to-noise  ratio 
is  available  at  each  receiver.  In  the  second  case  estimates 
are  not  available.  Results  indicate  that  two  or  three  bit 
quantizations  may  be  most  appropriate. 


1  Introduction 


antenna,  are  to  be  employed  to  achieve  a  diversity  gain.  A 
nonselective  fading  channel  is  considered  where  the  fading 
is  assumed  to  be  slow  enough  so  that  it  can  he  assumed 
constant  over  several  bit  periods.  In  our  explicit  examples, 
Rayleigh  fading  is  assumed.  The  observations  at  each  re¬ 
ceive  are  assumed  to  include  additive  zero-mean  Gaussian 
noise. 

of  the  receivers  will  generate  a  multiple  bit  deci¬ 
sion  and  a  single  final  decision  will  be  made  by  fusing  the 
decisions  from  the  individual  receivers.  Assume  that  syn¬ 
chronization  between  the  individual  receiver  decisions  has 
been  achieved,  so  that  each  set  of  receiver  decisions  corre¬ 
spond  to  the  same  transmitted  digit.  We  consider  two  cases. 
One  case  where  an  accurate  estimate  of  the  signal-to-noise 
ratio  is  available  for  the  observations  made  at  each  receiver 
and  a  second  case  where  no  such  estimate  is  available. 


Thae  is  significant  interest  in  using  wireless  commu¬ 
nication  systons  in  environments  where  severe  multipath 
failing  is  present,  which  can  limit  system  performance  [1]. 
To  mirigatp.  the  effects  of  multipath  fading,  diversity  tech¬ 
niques  using  multiple  ant^nas  have  been  proposed  [2,  3] 
and  it  has  been  found  that  the  paformance  improvements 
obtained  by  using  these  schemes  can  be  significant.  There 
rqjpears  to  be  a  trmd  towards  increasing  the  portion  of  wire¬ 
less  receivCTS  that  are  implemented  using  digital  technology 
in  many  plications.  Recent  improvements  in  electronic 
technology  indicate  that  all-digital  receivers  are  becoming 
practical  at  many  frequraicies  of  interest  and  furthCT  im¬ 
provements  in  the  speed  of  analog-to-digital  converters  are 
expected  to  continue  this  trend.  These  facts  indicate  that 
multiple  antrauia  diversity  schemes  that  combine  quantized 
samples  should  be  considered. 

Consid^  a  multipath  fading  environment  where  non- 
cohoent  binary  firequency  shift  keying  (FSK)  is  to  be  em¬ 
ployed  ^ .  Assume  that  N  receivers,  each  with  an  associated 

•This  material  is  based  upon  woik  suf^xtited  by  the  National  Science 
Foundation  under  Grant  No.  MIP-9211298 

'Tbe  analysis  given  here  is  applicable  to  spread  spectrum  signaling  as 
described  in  [4]. 


2  Optimum  Combining 


The  optimum  scheme  (minimum  probability  of  error) 
for  fusing  the  decisions  from  the  individual  receivers  is  to 
form  the  likelihood  ratio  for  the  set  of  individual  receiver 
decisions  [5]  and  to  compare  this  to  a  threshold.  Denote  the 
decision  at  the  j***  receiver  by  Vj  which  can  take  on  any 
of  the  values  Then  the  optimum  final  decision 

Uo  is  to  decide  for  a  “1”  sent  (I/b  =  1)  if  (ones  and  zeros 
equally  likely 

(1) 

j-\  *=1 


where 

/Profr(ffj  =  fell  sent  )\ 
-  \prdb{Vj  =  iblO  sent  y  ’ 


(2) 


Uj  is  the  observed  value  of  the  random  variable  Uj,  and 
J(tiy  =  ]fc)  is  an  indicator  function  which  is  unity  if  uj  =  k 

*nre  extension  to  cases  where  ones  and  zeros  were  not  equally  likely 
is  straightforward.  The  zero  on  the  left  hand  side  of  (1)  becomes 
In  (Pro6{0  sent)/Pro6(i  sent)). 
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and  zero  otherwise.  If  the  left  hand  side  of  (1)  is  less  than 
zCTo,  then  the  final  decision  decides  a  “0”  was  sent.  Note 
that  we  can  decide  “0”  or  “1”  for  the  event  where  the  left 
hand  side  of  (1)  equals  zero  without  affecting  performance. 
The  form  of  the  fusion  rule  given  in  (1)  is  valid  in  either  of 
the  two  cases  we  consider.  The  calculations  of  Prob{Uj  = 
fc|l  sent )  and  Prob(Uj  =  t|0  sent )  are  different  in  each 
case,  since  these  calculations  depend  on  the  schemes  used  by 
the  individual  receives  to  generate  their  multibit  decisions. 

One  special  case  of  interest  is  that  where  receiver  signal- 
to-noise  ratio  (SNR)  estimates  are  available,  and  where  the 
sensor  SNRs  change  so  slowly  that  the  estimates  can  be  sent 
to  the  fusion  cent»  with  infinite  precision.  Since  the  update 
rates  necessary  are  so  slow  this  communication  is  ignored. 
This  case  was  consid^ed  in  [6]  for  individual  receivers  that 
make  binary  decisions.  In  this  special  case  the  weights  are 
given  by 


In  (5),  /vj  ir,  (vj  It,-  ,  I  sent )  denotes  the  conditional  pdf  of 
Vj  which  is  (for  unit  variance  noise)  [8] 

/v,(vj|0sent,7j)  =  [  rexp 

J r  =max  (0, — v j-)  \  ^  / 

exp  (-7i)To  (ry/2^^  (r  +  vy)  exp  dr  (6) 

and  under  the  appropriate  symmetry  conditions 

fvi  {vj  1 1  sent ,  7j )  =  fv^  {-Vj  |0  sent ,  7,- )  (7) 

In  (5)  fr^ijj)  is  the  pdf  of  the  signal-to-noise  ratio  at  the 
jth  receiver.  For  example,  assuming  Rayleigh  fading  gives 
a  specific  form  for  fr^  {jj )  which  is 


W.  ^(Prob{Ui  =  kh, lscnt)\ 
\Prob{Uj  =  ib|7j-,0sent)/ 


(3) 


(8) 


whCTe  7j  is  the  SNR  estimate  at  receiver  j. 

3  Optimum  Receiver  Quantizers 


where  fij  is  the  average  signal-to-noise  ratio  at  the  jth  re¬ 
ceiver  and  ti(i)  =  1  for  2  >  0  and  is  zero  otherwise.  Using 
the  regions  in  (4)  allows  us  to  compute  the  required  proba¬ 
bilities  needed  to  calculate  (1)  as 


Each  of  the  individual  receivers  consists  of  two  bandpass 
matched  filters,  each  matched  to  a  sinusoid  (over  the  bit 
period)  with  a  diffi^nt  ffequency.  A  sinusoid  with  one  of 
these  frequencies  corresponds  to  a  “1”  being  sent,  while  a 
sinusoid  with  the  other  frequency  corresponds  to  a  “0”  being 
sent  The  outputs  of  the  match^  filters  are  sampled  at  the 
end  of  the  bit  interval  and  then  envelope  detected  to  produce 
the  random  variables  Roj  (large  for  “0”  sent)  and  Rij  (large 
for  “1”  sent)  at  the  jth  individual  receiver.  The  relative 
sizes  of  Roj  and  Rij  determine  the  likelihood  of  that  a  “1” 
or  “0”  was  sent.  Thus  an  important  quantity  is  the  observed 
value  of  the  random  variable  Vj  =  Rij  -  Roj  which  has 
probability  density  function  (pdf)  /v,  (vj  \l  sent )  given  the 
symbol  /  =  0  or  /  =  1  was  sent. 

First  consid^  the  case  where  no  estimates  of  the  receiver 
SNRs  are  available.  The  best  decision  scheme  at  the 
individual  receivCT  should  perform  a  quantization  of  the 
likelihood  ratio  of  the  receive-  obsCTvations  [7].  Thus  the 
individual  receiver  should  decide  Uj  =  k  if  vj  e 
where 


and 


I  tj,k—l  ^  hi 


//v,.(t;,|lsent)\  1 

\fv,{vj\0s^m)J  ^ 

(4) 


/y^(u,  |lsent)  _  /”=o  /v,|r,-  (vj  1 1  sent ,  7,-  )Jr.  (7,  )d7j 
fvj(vjl0sent)  /“=o/vi(v,|Osent  ,7j)yr,(7jM7j 


Prob{Uj  =  k\t  sent)  =  f  fv^  {vj\t  sent  )dvj,  (9) 


/  =  0, 1. 

Now  assume  that  an  estimate  of  the  signal-to-noise  ratio 
of  the  observations  at  each  receiver  is  available,  which  we 
take  to  be  equal  to  the  true  SNR  yj.  In  this  case,  the  deci¬ 
sions  from  the  j  th  individual  receiver  should  be  based  on  Vj 
and  The  best  decision  scheme  at  the  individual  re¬ 
ceiver  should  perform  a  quantization  of  the  likelihood  ratio 
of  1 7i  )•  Thus,  the  receiver  should  decide  Uj  =  k  if 
(.vj,7j)  €  A?  t  where  A?  ^  = 


{k.7j)  :  <  In 

with 


//v,(t)j|lsent,7,)\  1 

\/vi(vj|0sent,7,)y 

(10) 


/Vi.rj(«j.7j|lscnt)  ^  /v,(«j|I  sent  ,7,) 

fvj,rj{vj ,  7i  |0  sent )  fv^ {vj  |0  sent  ,7,)  '• 

and  the  required  probabilities  needed  to  calculate  (1)  are 
computed  as 


L 


Prob{Uj  =  k\l  sent)  = 
frji7j)fvj(yj\t sent  ,7y)dt;jd7,-  /  =  0, 1.  (12) 
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4  Optimum  Thresholds 

For  a  given  set  of  thresholds  =  1, . . . ,  *  = 

and  fading  statistics,  the  reception  scheme  is 
now  well  defined.  The  receiver  thresholds  are  chosen  to 
minimize  the  probability  of  oror  which  is 

P,  =  Prob(p  sent  )Prob{  error  |0  sent ) 

+Pro6(l  sent  )Prob{  error  1 1  sent )  (13) 

wh^ 

Ml  Mh 

Prob{  caror  |0  sent )  =  53  •  *  *  53 

ui=l  uw=l 

Prob{Uo  =  l|I7i  =  ui, . . . ,  I7iv  =  ^n) 
Prob{Ui  —  «i  |0  sent )  •  •  •  Prch{Us  =  sent ),  (14) 

Prob{Uo  =  1  |l7i  =  ui , . . . ,  I7iv  =  tijv)  is  specified  by  the 
fusion  rule  in  (1).  The  quantities  like  Prcb{U\  =  ui  |0  sent ) 
in  (14)  can  be  calculated  using  (12).  An  expression  similar 
to  (14)  exists  for  Pro6(  oror  11  sent )  as  given  by 

Prob{  OTor  |1  sent )  = 

Ml  Mtt 

PrdbiUo  =  l|l7i  =  «i) .  —  ujyr) 

1*1=1 

Prob{Ui  =  til  |1  sent )  •  •  •  Prob{Uff  =  ttj\r|l  sent ).  (15) 

We  have  searched  for  the  thresholds  which  minimize 
P,  in  (13)  by  using  a  numerical  gradient  descent  based 
technique.  While  it  is  difficult  to  guarantee  that  an  abwlute 
minimum  has  been  found,  this  technique  is  relatively  simple 
to  ^pply  nnd  solutions  which  give  good  performance  can  be 
obtained  easily  provided  only  a  small  number  of  individual 
receivCTS  and  quantization  levels  are  involved.  As  a  specific 
example,  considw  a  case  with  two  individual  receivors  with 
fti=H2  =  10  db  SNR,  and  unit  variance  noise.  To  simplify 
mattos  assume  even  symmetric  receiver  thresholds  [8].  To 
ffirthCT  simplify  mattes,  consida-  the  case  where  the  set  of 
thresholds  at  each  receiver  are  constrained  to  be  identical. 
Table  1  gives  the  best  schemes  we  found  for  cases  with 
receivCT  SNR  estimates  available  and  M\  =  Mz  =  2, 4, 6. 
The  results  in  Table  1  are  for  the  case  where  the  receivo- 
SNR  f-grimaifts  must  be  encoded  in  the  same  bits  as  the 
receiver  decisions  are  OTCoded.  If  the  SNR  of  the  receiver 
obsCTvations  is  changing  very  slowly  then  one  might  assume 
these  estimates  can  be  sent  to  the  fusion  center  without  any 
ov^ead.  This  is  the  case  considered  in  [6]. 

The  other  possibility  is  where  SNR  estimates  are  not 
available.  Table  2  gives  the  best  schemes  we  found 
for  cases  with  no  receiver  SNR  estimates  available  and 
Ml  =  Mz  =  2,4,6.  These  results  give  an  indication  of 
the  pCTformance  that  can  be  obtained.  The  results  in  both 


II 

si 

Pe 

tj,4 

2 

0.0833 

4 

0.0251 

2.40 

6 

0.0202 

1.02 

3.70 

Table  1.  Best  solutions  (SNR  estimate  avail¬ 
able)  with  SNR  =  10  dB.  Other  receiver 
assumed  identical  (tj,o  =  0  and  tj,Mj/2  = 
oo  for  j  =  1,2).  Rest  of  thresholds  at 

-tj,Mjl2-l- 


II 

Si 

Pe 

2 

0.0833 

4 

0.0305 

1.47 

6 

0.0245 

0.87 

2.00 

Table  2.  Best  solutions  (no  SNR  estimate)  with 
SNR  =  10  dB.  Other  receiver  assumed  identi¬ 
cal  {tjfl  =  0  and  =  oo  for  j  =  i,  2).  Rest  of 
thresholds  at  -tj,i,-tj,z, 


Table  1  and  Table  2  indicate  that  there  is  a  distinguish^le 
improvement  in  performance  when  using  two-bit  decisions 
over  the  performance  that  can  be  obtained  when  using  single 
bit  decisions.  For  increases  beyond  two  bits  the  paf  ormance 
gains  occur  more  gradually  (results  for  Mi  =  Mz  >  6.  not 
shown  in  Table  1  and  Table  2,  show  even  more  gradually 
improvement).  This  suggests  that  two  or  three  bit  deci¬ 
sions  may  be  adequate  in  many  cases.  Similar  results  have 
been  obtained  at  other  signal-to-noise  ratios.  These  results 
are  consistent  for  those  obtained  other  quantized  reception 
problems,  [9]  which  are  sometimes  called  distributed  detec¬ 
tion  problems. 

5  Discussion 

It  is  not  surprising  that,  typically,  cases  with  SNR  esti¬ 
mates  yield  better  performance  then  cases  withoutestimates. 
The  best  centralized  scheme  (no  quantization)  without  SNR 
estimates  is  a  noncoherent  detection  scheme  which  has  re¬ 
ceived  significant  attention.  If  the  avaage  SNRs  are  identi¬ 
cal  at  each  individual  receiver  then  the  optimum  centralized 
scheme  is  square-law  combining.  Even  if  the  average  SNRs 
are  at  e£K:h  individual  receivCT  then  the  optimum 

centralized  scheme  for  the  case  whwe  the  SNR  estimates  are 
available  is  not  square-law  combining.  The  optimum  cen¬ 
tralized  scheme  is  discussed  in  [10].  This  is  a  case  which  is 
intermediate  to  that  of  pure  coherent  and  pure  noncoherent 
detection.  It  is  useful  to  note  that  the  p^ormance  of  this 
scheme  is  bounded  by  the  performance  of  the  coherrait  and 


511 


noncohoioit  schemes,  since  it  ^>pears  difficult  to  develop 
an  analytical  expression  for  this  p^ormance.  Note  that  the 
poformance  of  the  optimum  cratralized  scheme  allows  us 
to  compute  the  poformance  of  the  distributed  sch^e  as  Mi 
and  M2  qjproach  00. 

The  binary  receivo'  decision  case  is  quite  intaesdng, 
since  in  this  case  the  poformance  for  the  cases  giv^  in 
Table  1  and  Table  2  ate  exactly  the  same.  This  is  reasonable 
since  the  best  receiver  thresholds,  even  without  the  symme¬ 
try  or  identical  receive  threshold  assumption,  are  at  leto. 
In  the  non-binary  cases  with  SNR  estimates  available  at  the 
receives,  the  thresholds  used  at  the  individual  receivo^  are 
essentially  chosen  to  be  diffinent  for  each  different  SNR.  In 
the  binary  case  this  does  not  occur,  the  best  thresholds  are 
always  zero.  Thus  the  SNR  estimate  is  not  actually  used.  In 
fact  it  is  easy  to  show  that  in  eith^  case  probability  of  oror 
is  exactly  equal  to  that  for  the  single  individual  receive'  case 
with  unknown  SNR  with  an  avoage  value  n  =  m 
which  is  P,  =  P,  =  1/(2  -f-  /i).  This  means  t^  using 
the  SNR  estimate  does  not  improve  poformance  and  that 
using  two  rath»  than  one  individual  receivo*  also  does  not 
improve  poformance. 

Now  consid^  a  case  with  N  individual  receivers.  In 
these  cases  the  best  fusion  rule  reduces  to  a  majority  rule 
which  will  randomly  choose  ETo  =  0  or  E/b  =  1  if  the  half 
the  receivers  decide  a  zero  was  sent  and  half  decide  a  one 
was  srat  (undo*  the  constraint  of  like  sensors).  The  ovoall 
OTor  probability  with  N  individual  receive  is 

+  E  ( k  ) 

t=LJV/2J+l  ' 

For  JV  =  lorJV  =  2wesee  that  P,  =  P,.  In  fact  if  N 
is  any  odd  integer,  (16)  shows  that  there  is  no  improvement 
due  to  increasing  N  by  one.  The  problem  is  the  random 
decision  which  is  made  if  the  half  the  receivers  decide  a 
zero  was  sent  and  half  decide  a  one  was  s^t.  For  N  >  2 
we  graoally  find  P,  <  P,. 

For  the  special  case  whoe  the  receiva*  SNRs  are  chang¬ 
ing  slowly,  so  that  exact  SNR  estimates  can  be  sent  to 
the  fusion  centa*  without  ovabead,  the  results  are  diffa- 
ent  The  paformance  in  this  case  must  be  as  good  or 
betta  than  the  otha  two  cases  we  consida.  In  this  case 
the  fusion  centa  can  combine  the  receiva  decisions  based 
on  the  true  SNR  of  the  obsavations  used  to  make  each 
decision.  Fa  the  case  of  two  individual  receivers  it  is 
easy  to  show  that  the  receiva  decision  with  the  highest 
SNR  will  detamine  the  final  decision.  Due  to  this  the 
performance  is  equivalent  to  that  for  selection  divasity 
which  is  P,  =  1/(2  -i-  2p  -f-  |i^/2)  <  1/(2  n)  since 
2  -I-  2/i  -f  ii^/2  >  2  +  /i.  Thus  in  this  case  thae  is  an 


improvement  ova  the  single  individual  receiva  case. 
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Abstract 

Multiuser  detection  techniques  provide  attractive 
performance  characteristics  for  CDMA  systems.  A  re¬ 
cently  proposed  blind  adaptive  multiuser  detector  pro¬ 
vides  near-far  resistance  without  requiring  any  more  in¬ 
formation  than  a  conventional  detector.  In  this  paper ^ 
an  optimum  and  two  suboptimum  multi- element  blind 
adaptive  receivers  are  proposed.  These  receivers  exploit 
the  spatial  distribution  of  the  users  in  a  multiple  access 
environment.  The  steady  state  performance  of  these 
detectors  is  analyzed  and  their  complexity  and  the  level 
of  information  required  by  each  of  them  is  compared. 


1  Introduction 

A  major  limiting  factor  in  the  performance  of  the 
conventional  (matched  filter)  receiver  for  CDMA  sys¬ 
tems  is  the  near- far  problem,  where  a  strong  interferer 
may  prevent  the  reliable  detection  of  the  desired  user. 
Multiuser  detection  techniques  provide  alternatives  to 
the  conventional  detector,  by  exploiting  various  lev¬ 
els  of  knowledge  about  the  interfering  signals  to  effect 
near-far  resistance  [4]. 

The  optimum  multiuser  detector  attains  the  perfor¬ 
mance  of  a  single-user  detector  by  assuming  the  knowl¬ 
edge  of  the  signature  waveform,  the  timing  and  the 
received  amplitude  of  each  of  the  users.  This  non¬ 
linear  detector  has  superior  performance  to  the  con¬ 
ventional  detector,  but  is  exponentially  complex  in  the 
number  of  users.  Several  suboptimum  multiuser  detec¬ 
tors  have  been  proposed  which  require  less  knowledge 
of  the  interfering  signals  and/or  have  lower  complexity, 
but  maintain  near-far  resistance.  An  example  is  the 
decorrelating  detector,  which  performs  a  linear  trans¬ 
formation  on  the  outputs  of  the  matched  filter,  can- 

'*‘This  work  was  supported  in  part  by  the  National  Science 
Foundation  under  Grant  MIP-9202081. 


celling  out  the  effect  of  multiple  access  interference  on 
each  user  .  When  the  interfering  users  are  weak  com¬ 
pared  to  background  noise  level,  the  performance  of  the 
decorrelating  detector  may  become  worse  than  a  con¬ 
ventional  detector.  Linear  MMSE  detectors  solve  this 
problem  by  incorporating  the  knowledge  of  the  users’ 
energies  [2].  These  detectors  perform  like  a  decorrela¬ 
tor  in  the  presence  of  strong  multiple  access  interfer¬ 
ence,  and  like  a  conventional  detector,  when  the  back¬ 
ground  noise  dominates.  The  chief  advantage  of  the 
MMSE  detector,  however,  is  in  its  ability  to  be  easily 
implemented  in  an  adaptive  fashion.  This  eliminates 
the  need  for  the  knowledge  of  the  signature  waveforms 
of  the  interfering  users.  A  training  sequence  has  to  be 
retransmitted,  every  time  there  is  a  severe  change  in 
the  received  signal,  which  can  become  cumbersome  in 
rapidly  changing  environments. 

A  blind  adaptive  multiuser  detector  has  been  pro¬ 
posed,  which  overcomes  this  problem  [l].This  receiver, 
which  only  requires  the  knowledge  of  the  signature  se¬ 
quence  of  the  desired  user  and  its  timing  (same  as  the 
conventional  detector),  uses  as  cost  function,  the  out¬ 
put  energy  of  the  receiver.  The  receiving  filter  is  de¬ 
rived  by  minimizing  the  output  energy,  subject  to  con¬ 
stant  response  to  the  signature  waveform  of  the  desired 
user.  This  detector,  which  is  similar  to  the  generalized 
sidelobe  canceller  in  array  processing,  consists  of  two 
orthogonal  branches,  where  the  filter  in  one  branch  is 
the  desired  user’s  signature  sequence,  while  the  other 
filter  is  adapted  to  minimize  multiple  access  interfer¬ 
ence.  It  can  be  shown  that  the  mean-square-error  and 
the  output  energy  differ  by  a  constant,  and  therefore  it 
is  possible  to  achieve  the  MMSE  detector  performance, 
e.g  near-far  resistance,  without  requiring  a  training  se¬ 
quence. 

In  order  to  reduce  interference  further,  we  present 
a  multi-element  blind  adaptive  multiuser  detector.  An 
optimum  multi-element  detector,  along  with  two  sub¬ 
optimum  detectors  are  derived,  analyzed  and  com¬ 
pared.  Specifically,  the  interaction  of  the  spatial  and 
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Figure  1.  Optimum  multi-element  receiver 


temporal  processing  stages  of  the  multi-element  blind 
receiver  are  discussed.  A  blind  adaptive  multi-element 
CDMA  receiver  which  uses  a  conventional  detector  as 
the  temporal  processing  stage  was  presented  in  [3]. 
Part  of  the  work  presented  here  is  a  combination  of 
that  blind  array  with  the  blind  adaptive  detector  in  [1]. 

2.  Signal  Model 

Let  K  users  transmit  simultaneously  over  a  pass- 
band  channel.  These  transmissions  are  received  by  an 
array  of  M  antennas.  The  propagation  delay  between 
antenna  elements  is  assumed  to  be  small  relative  to  the 
inverse  of  the  transmission  bandwidth,  i.e.  the  received 
signals  at  the  M  baseband  array  outputs  are  identical 
to  within  a  complex  constant.  The  vector  of  received 
signals  is  then  given  by 

K 

x(<)  =  2  ^k^kbkSkit)  a-n(t),  (1) 

k=l 

where  x{t)  =  [xi  (^)  •  •  •  a*  =  [a^  •  •  •  aMkf  is 

the  array  response  vector  for  user  k,  el  is  the  energy, 
Sk{t)  is  the  normalized  signature  waveform  over  sym¬ 
bol  interval  T,  and  n{t)  is  the  vector  of  the  additive 
white  Gaussian  noise.  The  {sk{t)}  are  real,  linearly 
independent  with  chip  rate  Tc  =  T/N.  Using  vector 
representation  for  signature  waveforms,  we  can  rewrite 
the  received  signal  as  an  M  x  AT  matrix: 

K 

=  ^  SLkSkbksl  -j-  aN.  (2) 

3  Optimum  multi-element  receiver 

Figure  1  shows  the  block  diagram  of  an  optimum 
multi-element  blind  adaptive  receiver.  This  can  be 
thought  of  as  the  the  natural  extension  of  the  single¬ 
element  blind  adaptive  multiuser  detector [1].  The  re¬ 
ceived  signal  at  each  antenna  can  be  written  as  : 

K  K 

^mk^kbk^k-^(^^m  —  -f an^,  (3) 

^=1  k=l 


where  is  cissumed  to  be  known.  The  output  of  the 
linear  filter  is  : 

K 

Um  =  >  +cr  <  >  .  (4) 

*=1 

For  convenience,  we  assume  the  desired  user  to  be 
k=l.  As  mentioned  before,  each  linear  filter  consists 
of  two  orthogonal  branches  v^  =  s^i+h^,  where 
<  >=  0.  Orthogonality  of  the  two  filters  en¬ 

sures  that  no  component  of  desired  user  is  passed 
through  the  adaptive  branch,  thereby  avoiding  the  can¬ 
cellation  of  the  desired  signal  at  the  output  of  the  de¬ 
tector.  Combining  the  outputs  of  the  filters  results  in 


KM  M 

y  =  '^^kbkiY^  <  SmkyVm  >)  +  > 

k~l  m=l  m=l 

(5) 

To  adapt  this  receiver  to  detect  the  desired  user  in  a 
blind  fashion,  the  linear  filters  are  chosen  to  minimize 
the  output  energy,  while  maintaining  the  response  of 
the  receiver  to  the  desired  user  at  a  constant  level.  The 
optimization  problem  can  be  formulated  as  : 


M 


min  E[yy*] 

subject  to  y]  <  Smi,  Vj 

m  ^ —  1 

m=l 

(6) 

where 

M 

E[yy*] 

rrijm'—l 

(7) 

^m,m' 

Sm 

=  [Smlj  •  *  •  jS^nic], 

E 

=  diag(ei,--*,eic). 

(8) 

Arranging 

^  for  all  m,  m'  = 

into  a 

MN  X  MN  matrix  £  =  and  forming  v  = 

[vi  ,■■■  ,'VmV  and  si  =  [s^i, •  •  •  we  can 

rewrite  the  optimization  problem  as  : 

rmn  X  subject  to  =  1  (9) 


The  solution  to  this  problem  is  known  to  be 


(10) 


4  Suboptimum  multi-element  receiver 


The  block  diagram  of  the  suboptimum  multi¬ 
element  blind  adaptive  multiuser  detector  is  shown  in 
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Figure  2.  Suboptimum  multi-element  receiver 


Figure  2.  The  first  stage  of  this  receiver  is  a  beam- 
former,  where  the  received  signals  at  each  antenna 
are  weighted  and  combined.  The  beamformer  out¬ 
put  is  sent  through  an  adaptive  linear  filter.  Consider 
w  =  [iwi  •  •  ■  wm]"^  to  be  the  weight  vector,  then  the 
output  of  the  beamformer  is  given  by  : 

=  w^X  =  -t-  cr(w^N)  (11) 

k=l 

If  V  is  the  impulse  response  of  the  linear  filter,  the 
output  of  this  receiver  can  be  written  as: 

K 

y  =  w^Xv  =  y^(w'^afc)efcbfc  <  s^,  v  >  H-o-(w^Nv) 

(12) 

The  beamformer  in  the  first  stage  translates  the  multi¬ 
dimensional  problem  into  a  single  dimension  problem, 
the  only  difference  is  that  the  output  of  the  receiver 
is  now  a  function  of  both  v  and  w,  so  a  performance 
improvement  due  to  added  spatial  discrimination  is  ex¬ 
pected.  We  call  this  receiver  suboptimum,  mainly  be¬ 
cause  the  total  number  of  adaptive  filter  taps  is  M  +  N, 
compared  to  MN  taps  for  the  optimum  receiver.  It 
has  reduced  complexity  ,  but  inferior  performance  to 
the  optimum  receiver. 

4.1  Known  array  response  vector 

In  this  section,  we  assume  that  the  suboptimum  re¬ 
ceiver  knows  the  array  response  vector  of  the  desired 
user,  and  develop  the  adaptation  rule  for  the  beam- 
former  weights,  as  well  as  the  taps  of  linear  filter.  The 
cost  function  to  optimize  is  the  output  energy  of  the 
receiver.  To  detect  the  desired  user,  w  and  v  are  var¬ 
ied  to  minimize  the  output  energy,  subject  to  the  con¬ 
straints  of  having  constant  spatial  gain  in  the  direction 
of  the  desired  user,  and  constant  temporal  gain  for  the 
signature  waveform  of  the  desired  user.  In  other  words, 

min  E  [yy*]  subject  to  <  si ,  v  >  =  1 

w^ai  =  1.  (13) 


Using  Eq.  12,  the  output  energy  can  be  written  as  : 


E[yy*]  =  <  v,Sfc  |w^aA;l^-l-CT^w"w  <  v,v  > 


The  solution  to  the  optimization  problem  is  : 

Rr^si  R-^ai 


sf  RJ^si  ’ 


Ra^^l  /"I 


where 


Rs  =  SAtuS^ -}-(T^(w^w)lAr, 

Ra  =  AS„A^ -|-(7^  <  V,v  >  Im, 

S  —  [si ,  *  ■  * ,  s/c ,  ] 

A  =  [ai,---,aK], 

Au,  =  diag(|w^ai|^,---,|w^aKn, 

S„  =  diag(<  v,si  >^•••,<  v,Sfc  >^).  (16) 


Since  E  [yy*]  is  a  function  of  both  w  and  v  the  solu¬ 
tions  for  w  and  v  are  not  independent,  making  it  very 
difficult  to  analyze  the  system. 


4.2  Unknown  array  response  vector 

In  this  part  we  consider  a  situation,  where  no  infor¬ 
mation  about  the  array  response  vector  of  the  user  of 
interest  is  available.  This  is  mostly  the  case  in  practice, 
where  incoherent  modulation  and  multipath  propaga¬ 
tion  make  it  impossible  to  have  a  correct  estimate  of 
the  array  response  vector  at  the  receiver.  The  proposed 
receiver  uses  the  knowledge  of  the  signature  sequence 
of  the  desired  user  to  blindly  adapt  the  beamformer 
weights,  so  that  the  desired  user  is  passed,  while  the 
energy  of  the  interfering  signals  is  minimized  at  the 
output  of  the  beamformer.  The  output  energy  of  the 
beamformer  is  used  as  the  cost  function;  the  beam- 
former  weights  are  chosen,  so  that  this  energy  is  min¬ 
imized.  A  constraint  must  be  used,  so  that  the  triv¬ 
ial  solution  is  avoided.  As  explained  before,  the  lin¬ 
ear  filter  maintains  a  constant  response  to  the  desired 
user,  while  it  adapts  itself  to  minimize  the  interfer¬ 
ence.  Thus,  we  can  use  for  the  constraint,  a  constant 
energy  at  the  output  of  the  receiver  in  Figure  2.  So 
the  optimization  rule  for  the  beamformer  weights  can 
be  written  as  follows  : 

minFfr-^r]  subject  to  E[yy*]  =  I,  (17) 

W 

where, 

F[r^r]  =  w-^Rw, 

R  =  AE^A-^ -1- ATct^Im, 

E[yy*]  =  w^RflW.  (18) 
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N=7 ,  K=4  . 


User 

1 

2 

3 

4 

Plk 

1 

3/7 

-1/7 

-3/7 

AO  A 

0° 

5° 

-25° 

55° 

SNR(dB) 

10 

14 

20 

14 

Table  1.  Angle  of  arrivals  and  SNRs 


Receiver 

M  =  1 

M  =  2 

Conv 

Blind 

Opt 

Sub 

ai 

- 

SNIR(dB) 

-13.7 

6.7 

12.1 

11 

Q 

Table  2.  SNIRs  for  various  receivers 


The  solution  to  this  problem  is  given  by  the  generalized 
eigenvector  corresponding  to  the  minimum  eigenvalue 
of  the  matrix  pencil  (R,Ra).  The  optimization  rule 
for  the  linear  filter  is  same  as  previous  case.  Because 
of  stronger  constraint  used  for  the  beamformer,  the 
first  receiver  has  better  performance,  but  the  advan¬ 
tage  of  the  second  one  is  that  it  does  not  require  any 
more  knowledge  than  the  conventional  detector  and  the 
single-element  blind  detector. 

5  Performance  analysis 

in  this  section,  we  analyze  and  compare  the  steady 
state  performance  of  the  proposed  detectors  with  the 
single-element  blind  detector  and  the  conventional  de¬ 
tector.  As  the  performance  measure,  we  use  the  signal 
to  interference  plus  noise  ratio.  We  also  compare  the 
beampatterns  of  the  two  suboptimum  detectors,  which 
employ  a  beamformer  as  their  first  stage.  We  consider 
a  system  with  AT  =  4  users,  processing  gain  of  W  =:  7 
and  M  =  2  antennas. 

Figure  3  shows  the  spatial  distribution  of  the  four 
users,  along  with  the  beampatterns  corresponding  to 
the  two  suboptimum  beamformers  discussed  in  the  pre¬ 
vious  section.  Table  1  lists  the  angle  of  arrivals,  the 
crosscorrelations  with  user  1  and  SNRs  for  the  desired 
user  and  the  interfering  users. 

It  can  be  seen  that  adding  an  antenna  improves  the 
signal  to  interference  ratio  of  the  receiver.  The  spatial 
gain  depends  on  the  algorithm  used.  An  interesting 
fact  about  the  suboptimum  multi-element  detectors  is 
the  way  the  spatial  and  temporal  processing  stages  in¬ 
teract.  We  can  see  that  although  the  detector  with  no 
knowledge  of  ai  puts  a  stronger  null  in  the  direction  of 
the  strongest  interferer,  the  overall  performance  of  the 
other  detector  is  better.  This  is  because  the  level  of 
interference  rejection  of  the  linear  temporal  filter  is  a 
non/mear  function  of  the  strength  of  interference  at  its 


Figure  3.  Beampatterns  of  the  suboptimum 
receivers 


input.  The  interplay  between  interference  rejection  by 
temporal  and  spatial  processing  is  under  investigation. 

6  Conclusion 

The  steady  state  performances  of  three  multi¬ 
element  blind  adaptive  receivers  were  studied.  It  can 
be  seen  that  all  these  receivers  can  take  advantage  of 
the  spatial  distribution  of  the  users  to  reduce  the  level 
of  interference.  More  study  needs  to  be  done  to  investi¬ 
gate  the  robustness  of  these  receivers,  when  there  is  no 
spatial  discrimination  between  the  users.  Future  work 
will  also  include  the  analysis  of  the  adaptive  algorithm 
for  these  receivers,  and  the  study  of  their  performance 
in  an  asynchronous  multipath  environment. 
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Abstract 

An  upper  bound  is  derived  for  the  probability  of  error  in 
an  asynchronous  binary  direct-sequence  spread-spectrum 
multiple-access  communications  system  operating  over 
frequency  selective  Rayleigh  fading  channels.  A  coherent 
RAKE  receiver  with  predetection  selective  diversity 
combining  is  considered.  The  performance  of  a  multipath- 
combinig  receiver  is  determined  for  the  case  of  multiple 
interfering  transmitters.  Furthermore,  the  performance  of 
the  system  is  determined  in  terms  of  parameters  of  the 
signature  sequences.  These  parameters  can  be  used  as 
guides  in  selecting  sequences  for  the  system.  The  bounds 
agree  with  the  exponential  portion  of  a  normal  distribution 
in  which  the  interfering  interference  components  subtract 
from  the  signal  amplitude.  The  results  obtained  are 
verified  by  simulation. 

1.  Introduction 

In  this  paper,  we  consider  the  performance  analysis  of  the 
multipath-combinig  receiver  also  called  RAKE  receiver. 
The  analysis  applies  to  systems  that  use  binary  phase- 
shift-keyed  (BPSK)  modulation.  We  consider  a  multipath- 
combinig  receiver  and  deterrmne  the  performance  of  the 
system  for  the  case  of  multiple  interfering  transmitters  that 
use  different  PN  sequences  but  having  a  small  factor  of 
correlation. 

The  system  designer  use  to  assume  that  the  limiting 
corrupting  signal  has  a  Gaussian  distribution.  This 
assumption  can  no  longer  be  justified,  and  the  interference 
in  multiple-access  schemes  will  not  show  a  Gaussian 
characteristics.  Although  a  small  thermal  noise  power  will 
be  present,  it  is  necessary  to  consider  the  joint  effect  of 
the  Gaussian-distributed  thermal  noise  and  the  non- 
Gaussian  distributed  interference. 

The  analysis  will  be  based  on  the  well  known  Chernoff 
bound.  This  analysis  requires  only  the  evaluation 
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or  bounding  of  the  moment-generating  function  of  the 
additive  interference.  The  bound  is  expressed  in  terms  of  a 
parameter  that  is  the  unique  solution  of  an  equation 
containing  the  derivative  of  the  moment  generating 
function  of  the  interference.  The  bound  is  tight  for  high 
SNRs,  which  is  the  region  of  interest  in  most  mobile 
communications  systems. 

II.  A  System  and  Channel  Model 


The  system  of  communications  proposed  for  this  study  is 
shown  in  figure  (1). 


FK3 1 .  ComnHmkatkms  lyitm  wHh  M^wif  ttmultMiMisly 


The  k-th  transmitted  signal  for  a  binary  DS  CDMA 
system  with  BPSK  modulation  and  arbitrary  chip 
waveform  can  be  expressed  as 

s.  (t)  =  V^P^SbkliMtlu^Ct-iT)  (1) 

where  Pk  is  the  power  in  each  K-th  transmitted  signal, 
Uk(t)  is  the  signature  waveform  and  bk(i)  the  i-th  symbol  of 
the  k-th  user.  An  equal  power  assumption  ,Pk,  is  made  for 
convenience  in  the  analysis.  We  assume  that  there  are  N 
code  chips  in  each  data  symbol  (T=NTc)  and  the  period  of 
the  signature  sequence  (Uk(t))  is  N.  The  received  signal 
from  a  typical  transmitter  consists  of  a  random  number  of 
paths  of  Ae  transmitted  signal.  The  delay,  amplitude,  and 
phase  associated  with  each  path  are  also  random. 

rj^(t)=  2  a„(t)exp(jOn(t))sk[t-Tn(t)]  +  z(t)  (2) 

n  =  1 

In  (2)  the  random  variable  L  represents  the  number  of 
paths  of  the  k-th  user.  The  random  variables  a„(t),  x„(t), 
4>n(t),  represent  the  amplitude,  delay,  and  phase  associated 
with  the  nth  path  of  a  signal  from  transmitter  k.  We 
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consider  the  delay  Tn(t)  as  an  integer  of  Tc.  The  term  Zk(t) 
represents  the  additive  white  Gaussian  noise  (AWGN) 
with  complex  spectral  density  No  Watt/Hz. 

We  consider  the  fact  that  in  a  real  communication 
system  the  links  from  each  of  the  K  active  transmitters  to 
the  listening  receiver  are  mutually  independents.  The 
random  variables  variables  an(t),  TnW,  and  the 

data  symbols,  are  assumed  to  form  a  set  of  mutually 
independent  random  variables.  Also,  each  phase  On(t)  is 
assumed  to  be  uniformly  distributed  on  the  interval  [0,27c). 
We  model  the  arrival  of  signal  paths  at  receiver,  T„(t),  by 
a  nonhomogeneous  Poisson  process  with  the  arrival  rate 
Pd(x).  The  amplitudes  of  the  signal  paths,  an(t),  exhibit  a 
Rayleigh  distribution.  The  structure  of  the  receiver  is 
shown  in  figure  (2). 


It  employs  a  single  tap  delay  line  trough  which  is  passed 
the  received  signal  r^Ct).  The  signal  at  each  tap  is 
correlated  with  (Cn^)*(t)Uk’'(t),  n  =  1,  2,  ...,  L,  where  Uk(t)  is 

the  PN  sequence  of  the  k-th  desired  user,  exp|j<I>*^  j 

the  impulse  response  of  the  channel  associated  to  the 
desired  user  and  (*)  denotes  complex  conjugate. 

III.  Evaluation  of  Error  Probability  Bound 


We  shall  now  evaluate  the  performance  of  the  RAKE 
receiver  with  simultaneous  active  users  under  the 
condition  that  the  fading  is  sufficiently  slow  to  allow  us  to 
estimate  Cj^{t)  perfectly  (without  noise).  Thus  the  decision 
variables  may  be  expressed  in  the  form 

U,  (i)  =  Se  I  (c*^ ) *  I r(t)u*  (t  -  qTc)dt  (3) 

q  =  l  q  T 

Stationarity  of  the  amplitudes  paths  of  arrivals  is 
assumed.  The  received  signal  can  be  expressed  as  follows 


r(t)  =  Ib^(i)  S  cj^u^(t-nTc-iT)+ 
i  n  =  1 
Nf  L  , 

X  Xb  .(i)X  c  u  .(t-pTc-iT)+z(t) 
nf=l  1  nf  p=l  p  nf 

nf5^:k 


(4) 


where  Unf(t)  are  the  PN  sequences  and  bnf(i)  the  symbols 
of  the  Nf  users  that  share  the  same  transmission  band.  Eq 
(3)  can  be  expressed  as: 


Nf  L  L 


nf=l  -d>q)b^|,u„f(t-nTc)Uk(t-qTc)dt 

nf=k,q;in 

L  k  k 

+  9te  a  expHO^)J  z(t)u*(t~qTc)dt  (5) 

q=l  q  q  T  k  ^ 

When  the  signals  are  antipodal,  a  single  decision  variable 
suffices.  Then,  if  consider  the  maximum  cross-correlation 
value  between  the  set  of  signature  waveforms,  Eq  (5)  can 
be  simplifies  to  the  following: 

L...  Nf  LL..  L  ,  , 

U  b  eS  (a^)^+7E  I 

k  k  q=l  q  nf=l  nf=l  q=l  q  n  q=l  q  q 

^  nf=k,q9tn  ^ 

withiN'^  =  e  “I  Jz(t)u* (t -  qTc)  and  e  ‘J’"  =  e  "  b  . 


2(111  + 1)  /  2  ^  j 


lu  (t-mTc)u  (t-*pTc)dt 


'  k  =  q,  p  m 


where  e  =  I  u  (t)u*  (t)dt  is  the  energy  of  the  k-th  signature 

Y  K  K 

waveform  and  the  normalized  cross-correlation  bound 
holds  for  m-length  Gold  sequences.  The  interference  will 
be  modeled  as  follows: 

NfL' 

Ti^yE  ^5.cos0.  (7) 

being  a  random  variable  that  results  from  the  product  of 
two  independent  Rayleigh  distributed  random  variables, 
0i  is  assumed  to  be  uniformly  distributed  in  the  interval 
[0,27t)  and  L’-L(L-l).  We  define  the  total  interference  z 
as  2  =  T)  +  n,  that  is,  the  multi-user  interference  plus  de 
thermal  noise  term.  As  far  as  z  is  the  sum  of  two 
independent,  zero-mean  random  variables,  the  Chernoff 
bound  applies: 

Pr(z  >  x)  <  exp  ^  E[exp  A.n]E[exp  Xr\\  for  all  X  >  0  (8) 

after  some  manipulations  (8)  becomes: 

-U  NfL*  /  \ 

Pr(z  >  x)  <  exp  exp  ”  0  Io^X4.  j  for  all  A,  >  0  (9) 


for  all  ^  >  0 


Equation  (10)  can  be  simplified  by  using  exponential 
upper  bounds  to  the  order  zero  modified  Bessel  function 

Pr(z  >  x)  <  exp”^^  exp^  ”  exp  ‘  for  all  X  >  0  (10) 

The  optimal  value  of  X  which  minimizes  the  right  side  of 
(10)  is  obtained  by  setting  the  derivative  of  the  expression 
with  respect  to  X  to  zero: 


Pr(z  >  x)  ^  exp- 


L'  r,  L’Nf 

EXaf-Ey  It 
i=l  ‘ 
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where  =  E  No  X  a?  ■  We  define  7b  as: 
“  i  =  1  ' 

L'  ,  L'Nf 

E  X  a?  -  Ey  X  L 
i=l  *  i=l 


2 

1  ® 

V 


(12) 


The  final  step  in  this  derivation  is  to  average  the 
conditional  error  probability  given  in  (11)  over  the  fading 
channel  statistics.  Thus  we  evaluate  the  integral: 

(13) 


Pe=JPr{YbX(Yb)dYb 

It  is  difficult  to  find  the  distribution  function  of  (12),  and 
with  the  resultant  expression  is  not  possible  to  obtain  a 
closed  expression  of  (13).  Thus  we  consider  the  average 
of  On  instead  the  exact  expression.  Defining 

=  2o^,the  expression  (12)  reduces  to: 


r.  = 


2No(T^L' 


(y  2  WA 


(14) 


It  can  be  shown  that: 


f(Yb)  = 


E-1 
-  X 


4.  .Y„ 


c{l-i)!(2M- 1)1(0^)  2^  ^ 

7+2 


L-1 


l(j  +  2M)-^ 


,}f2M 


with  M=  L’Nf  and  P 


when  the  variance  of  the 


277" 


amplitudes  of  the  paths  of  arrival  belonging  to  different 
users  are  equal.  We  can  define  Q  as: 

(15) 


I  2NoL’c7 

Then,  the  error  probability  (13)  results  in 


Pe 


<  — (t.'-l  -  j) !  .X 


(-1)' 


j=0 


k=0 


L-l-j-k  J 

2  *!r| 

r  2  J 

where  the  terms  in  this  series  correponding  to  k=L’-j+l,  L’-j+3, 
L’-j+5, ...  are  understood  to  be  zero. 

1 

A  =  - 


,  2\2L'M 
a{L'-\)\{2L'  Nf)\\a  j  2 

f  i'-A 


L’  2L'Nf 

r 


U  J 


]^r{j  +  2L'  Nf) 


IV.  Numerical  Results 

The  format  of  the  signal  has  the  following  parameters: 
bandwidth  W  =  (lOOnsec)’'  =  10  Mhz  ;  spread  factor  TW  = 
127  ;  transmission  rate,  1/T  =  78.7  Kbits/sec.  A  maximum  of 
71  chips  delay  spread  has  been  considered  and  thus  the 


intersimbolic  interference  will  be  negligible.  The 
transmission  scheme  of  figure  (1),  in  which  up  to  6 
pseudorandom  sequences  of  maximum  period 
theoretically  mutually  orthogonal,  have  been  simulated, 
corresponding  to  five  possible  interfering  users.  Each 
user’s  sequence  is  multiplied  by  the  symbol  to  be 
transmitted  (BPSK  modulated).  Once  the  signals  are 
modulated,  they  go  through  a  selective  frequency  time 
variant  channel.  A  different  channel  has  been  considered 
for  each  user.  Value  L’  is  computed  as  L(L-l)  where  L  is 
taken  as  the  mean  of  the  paths  generated  at  each  impulse 
responses  of  the  different  users  and  it  results  a  value  of  15 
paths.  Considering  that  the  variance  of  the  arriving  paths 
has  been  assumed  to  be  the  same  for  each  channel 
E[crt:^]=o.i2>  the  results  from  figure  (3)  are  obtained  when 

only  one  interfering  user  is  present.  Figures  (4)  and  (5) 
show  the  probability  of  error  and  its  upper  bound  for  three 
and  five  interfering  users  respectively  and  finally,  figures 
(6)  and  (7)  show  only  the  upper  bound  for  one,  three  and 
five  interfering  users  for  different  signal  to  noise  ratios. 

V.  Conclusions 

Several  interesting  facts  are  obtained  from  the  near-far 
effect  in  an  optimum  receiver  under  multipath  operation 
conditions.  These  include  the  effect  of  the  increase  in  the 
factor  7  and  the  number  of  interfering  users  on  the  BER, 
and,  on  the  other  hand,  the  limitation  of  the  system 
capacity  as  function  of  the  number  users. 

It  is  important  to  observe  that  small  increases  in  the 
factor  7  can  affect  the  BER  in  the  event  of  multiple  users 
access.  In  figure  (8)  for  one  interfering  user,  it  can  be 
observed  than  an  increase  of  7  can  affect  considerably  the 
BER  of  the  system.  It  should  be  taken  into  account  that  for 
the  optimum  detection,  it  is  in  a  certain  way  desirable  for 
the  multipath  problem,  as  the  receiver,  adapted  to  the 
channel  of  the  mobile  of  interested,  enhance  the 
appropriate  paths  and  discriminate  the  paths  arriving  from 
other  PN  sequences  (diversity). 
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fig  3.  Probability  of  error  with  one  interfering  user 


fig  5.  Probability  of  error  with  five  interfering  users 
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ABSTRACT 

The  decorrelating  and  minimum  mean  squared  error  data 
detectors  for  direct  sequence  code  division  multiple  access 
(DS-CDMA)  communications  systems  axe  known  to  exhi¬ 
bit  low  vulnerability  to  the  near-far  problem.  Nevertheless, 
the  performance  of  these  algorithms  is  highly  sensitive  to 
accurate  knowledge  of  the  user  propagation  delays  as  well 
as  inter-symbol  and/or  inter-chip  interference  such  as  that 
produced  by  frequency-selective  fading  channels.  In  this 
paper,  a  new  sub-optimum  symbol-by-symbol  detector  is 
presented  which  is  robust  in  the  presence  of  these  two  ef¬ 
fects. 

1.  INTRODUCTION 

Direct-sequence  code  division  multiple  access  (DS-CDMA) 
communications  systems  have  recently  received  increased 
attention  as  a  promising  candidate  for  emerging  mobile 
digital  radio  networks.  For  this  reason,  much  work  h^ 
been  reported  on  the  problem  of  multi-user  detection  in 
DS-CDMA.  For  asynchronous  systems,  the  standard  matc¬ 
hed  filter-bank  detector  is  known  to  fail  for  users  of  widely 
disparate  power  (the  so-called  “near-far  problem”).  More¬ 
over,  the  unrealistically  high  computational  complexity  of 
the  optimum  (i.e.,  minimum  probability  of  error)  detector 
[1]  has  motivated  research  on  sub-optimum  multi-user  de¬ 
tectors  [2]- [3]. 

Such  systems  rely  on  exact  knowledge  of  additional  pa¬ 
rameters  such  as  carrier  phase,  signal  strength,  and  pro¬ 
pagation  delay  for  each  user.  However,  such  receivers  can 
exhibit  high  sensitivity  to  errors  in  estimates  of  these  para¬ 
meters,  especially  propagation  delay,  as  was  shown  to  be  the 
case  for  the  decorrelating  detector  in  [4].  Similar  problems 
would  be  observed  for  frequency-selective  fading  channels. 
This  issue  has  been  addressed  in  [5]  where  a  detection  sc¬ 
heme  relying  on  a  multipath  ray  model  was  proposed.  The 
technique  requires  estimation  of  the  propagation  delays  of 
the  individual  rays  of  each  user,  and  data  detection  is  subse¬ 
quently  performed  by  forming  a  linear  combination  of  sym¬ 
bol  estimates  associated  with  each  ray. 

This  paper  presents  a  simple,  direct  approach  to  ro¬ 
bust  data  detection  in  the  presence  of  uncertain  propaga¬ 
tion  delay  estimates  and/or  frequency  selective  fading.  The 
technique  is  based  on  a  generalization  of  a  maximum  sig¬ 
nal  to  interference  plus  noise  ratio  (MSINR)  symbol-by¬ 
symbol  detector  (the  generalization  to  a  block  approach  is 
straightforward).  The  multipath/propagation  time  uncer¬ 
tainty  is  taken  into  account  by  a  simple  statistical  model. 
The  resulting  detectors  are  shown  to  be  near-far  resistant 
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HCM/CHRXCT-930405,  PRONTIC/CICYT  TIC95-1022-C05- 
01  and  CIRIT/Generalitat  de  Catalunya  GRQ93-3021. 


and  insensitive  to  “small”  errors  in  propagation  delay  esti¬ 
mation  «nd/or  frequency  selective  multipath  channels  with 
“small”  delay  spread.  The  technique  may  also  be  useful  in 
cases  where  accurate  timing  estimates  are  available  but  can 
be  updated  with  relatively  low  frequency.  In  this  case,  the 
receiver  is  robust  is  the  presence  of  small  changes  in  timing 
which  take  place  over  the  propagation  time  up-date  inter¬ 
val.  As  with  other  sub-optimum  approciches  computational 
complexity  is  linear  in  the  number  of  users. 

2.  PROBLEM  FORMULATION 

Consider  a  K  user  asynchronous  DS-CDMA  system  nomi¬ 
nally  operating  over  a  channel  with  additive  white  Gaus¬ 
sian  noise  (AWGN).  Binary  Phase  Shift  Keying  (BPSK) 
modulation  is  used.  Using  the  notation  of  [4,  5],  the  sym¬ 
bol  interval  will  be  denoted  as  T  and  the  chip  interval  as 
To  =  T/TV,  where  N  is  the  number  of  chips  per  symbol. 
The  kth  user's  code  waveform  is  of  unit  amplitude  and  is 
denoted  by  hkit).  It  can  be  expressed  as  a  pulse  amplitude 
modulation  of  {ck  {n)}^~Q  ,  the  pseudo-noise  (PN)  sequence 
associated  with  the  kth  user: 

iV-l 

hk  (t)  =  ^  Cfe  in)p(t  -  nTc)  (1) 

n=0 

where  p{t)  is  a  pulse  whose  duration,  in  general,  exceeds 
the  chip  interval,  Tc.  The  data  sequence  for  the  kth  user, 
dk{m)  e  {-1,+1},  is  pulse  amplitude  modulated  by  a  sin¬ 
gle  period  of  the  corresponding  code  waveform  resulting  in 
a  baseband  signal  written  as: 

oo 

s.(t)=  E  dk{m)bk{t  -mT),  (2) 

m——<X3 

The  transmitted  signal  is  the  product  of  the  baseband  signal 
and  the  carrier:  y/2'^ cos  (uct  +  0^)  where  ujc^  7fe> 
respectively  denote  carrier  frequency,  kth  user  power  and 
carrier  phase. 

In  general,  the  channel  associated  with  the  kth  user  can 
be  modeled  as  a  linear  time- varying  system  /ifc(t,r)  which 
denotes  the  channel  response  at  time  t  to  an  impulse  applied 
T  seconds  in  the  past.  The  received  signal  is: 

r^(t)  =  n  (t)  +  (^) 

where  n'(t)  is  AWGN  of  two-sided  power  spectral  density 
level,  No/2.  The  equivalent  complex  baseband  representa- 
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tion  of  the  signal  is  given  as: 

k=l 

(4) 

If  the  channel  is  modeled  as  a  simple  constant  delay  (pos¬ 
sibly  not  precisely  known)  then  its  impulse  response  can  be 
written  as  /i^  (t,  r)  =  (^(t  —  7>)  where  it  will  be  assumed  that 
Tk  €  [--Tf2yT f2).  In  this  case  the  received  signal  is: 

K 

r{t)  =  n{t)  +  Y^^y^e^^'‘Sk{t-Tk)  (5) 

k=l 

where  Ok  =  —u^cTk  +  It  is  this  model  that  will  be  used 
throughout  the  paper. 

Next  define  the  received  signal  vector  as  r(Tn)  € 
as  the  sum  of  a  signal  vector  s(m)  and  noise  vector  n(m): 


r(77i) 


r(mn  r(^mT  +^y..,r([m  +  1]T  -  ] 

s(m)  +  n(m)  (6) 


where  (•)‘^  denotes  transpose.  It  is  not  difficult  to  verify 
that  the  signal  vector  can  be  expressed  as: 


K  1 

s(m)  =  J  EE  bj;  (tk  +  iT)  dk  (m  +  i)  (7) 

k=l  »=-l 

J  =  [OQiVjIgjVjOQiv]  €  (8) 

[bfc  {Tk  +  iT)]„  =  V2bk  ([n  -  l]T/QN  -  [t  +  1]T  -  Tfc)(9) 
ne  {1,2,  €{-1,0,1} 


where  the  subscripts  on  the  square  zero  and  identity  matri¬ 
ces  in  (8)  denote  the  dimension  of  these  matrices,  and  [*]« 
denotes  the  nth  element  of  a  vector.  The  above  expression 
can  be  written  more  compactly  in  matrix  form: 

1 

s(m)  =  J  B  (r  +  iT)  d  (m  -f  i)  (10) 

*=-i 

B  (r  +  iT)  =  [bi  (n  +  iT) ,  •  ♦  • ,  (tk  +  iT)] 

d  (m  +  i)=  (m  +  i),  •  •  ■ ,  y/^e^^^dK{m  +  i)]  ^ 

r  =  [ri,---,rK],  T  =  T[V^ 

K  elements 


The  correlation  matrix  of  the  received  vector  is: 


R  =  £?  [r(7n)r^(7n)] 

1 

=  ^  JB(T  +  iT)rB^(T  +  iT)J^  +  t7^I  (11) 

i=-l 

r  =  £[d(7n  +  i)d"(m  +  i)]  (12) 

where  E[]  is  the  expectation  operator.  It  is  assumed  that 
each  user’s  symbols  are  uncorrelated  with  those  of  other 
users  (implying  that  F  is  diagonal  with  diagonal  elements 
equal  to  user  powers) . 


Let  us  define  the  signal-to-interference-and-noise-ratio 
(SINK)  and  the  mean-squaxed-error  (MSB)  as: 


SINR(wfc) 

MSE(wfc) 


wfR.[i+n]fcW*, 

£;[|d,(7n)-wfr(7n)p] 


The  “signal”  correlation  matrix  is  defined  as  the  cor¬ 
relation  matrix  of  the  signal  vector  in  the  presence  only  of 
the  mth  symbol  of  the  fcth  user: 


R,,  =7fcJbfc(0)bf  (O)J^  (14) 

where  it  is  noted  that,  without  loss  of  generality,  the  asso¬ 
ciated  propagation  delay  is  set  to  zero:  Tk  —  0,  Conversely, 
the  interference-plus-noise  correlation  matrix  R[i+n]^  is  de¬ 
fined  as  the  correlation  matrix  of  the  received  vector  in  the 
presence  only  of  the  noise,  the  remaining  K  —  I  users  and 
the  (m  —  l)th  and  (m  +  l)th  symbols  of  the  A;th  user: 


—  R  —  R^fc  •  (15) 

Consider  now  the  maximum  SINR  (MSINR)  symbol-by¬ 
symbol  receiver  for  user  fc.  It  is  well  known  that  this  pro¬ 
blem  can  be  solved  using  generalized  eigenanalysis.  The 
general  solution  of  the  MSINR  receiver  can  then  be  written 
as: 


^^k(MSINR)  “  SINR(w/5;)  (16) 

=  CkGjnax 

where  Gmax  denotes  the  generalized  eigenvector  associated 
with  the  maximum  eigenvalue  of  the  above  matrix  pencil, 
and  a  is  an  arbitrary  (non-zero)  constant.  We  can  also 
consider  a  minimum  mean  squared  error  (MMSB)  symbol- 
by-symbol  receiver  for  user  k  similar  to  [6]: 

^okfj^MSE)  =  MSE(y^k)  (17) 

=  /?R“^Pfc  PA;  =  E[r(m)dl{m)] 

where  is  a  constant.  In  the  particular  case  of  a  rank-one 
matrix  Rs*.,  it  is  well  known  ^Ok(MsiNR)  ~  '^Ok(MMSE)' 
It  is  also  well  known  that  if  the  noise  power  is  very  low 
compared  to  powers  of  the  interferering  users,  the  MMSE 
receiver  acts  as  a  decorrelator  completely  nulling  the  effect 
of  the  interferering  users.  That  is  to  say,  the  magnitude 
of  receiver  output  for  the  kth.  user  will  be  approximately 
zero  for  each  of  the  interfering  users  at  the  times  for  which 
the  output  provides  an  estimate  for  the  kth  user’s  symbols. 
However,  under  these  low  noise  conditions,  it  has  recently 
been  shown  that  inaccurate  timing  estimates  for  the  users 
can  drastically  reduce  performance  resulting  in  high  receiver 
sensitivity  to  near-far  effects  [4],  The  problem  addressed  in 
this  paper  is  the  design  of  near-far  resistant  receivers  that 
are  robust  in  the  presence  of  such  timing  errors. 

3.  EFFECT  OF  TIMING  ERRORS 

In  practice,  since  r  is  not  directly  available,  estimates  of  the 
propagation  delays,  f  =  [fi ,  •  •  * ,  are  used  in  the  design 
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of  the  receivers.  In  this  section,  the  effects  of  timing  error 
on  output  SINK  and  MSB  as  defined  in  (13)  are  quantified. 

It  will  be  assumed  that  the  timing  estimates,  n,  can  be 
expressed  as  a  sum  of  the  true  propagation  delay,  Tk ,  and  a 
zero  mean  Gaussian  random  variable,  ejc  of  variance  crg^^. 

n  =  Tfc  +  efc,  Ee  [efc]  =  0,  Et  [efc€fe/]  =  (Te,^S{k  -k) 

(18) 

where  Ee  is  the  expectation  over  the  K  propagation  delay 
errors.  The  above  model  also  accounts  for  the  multipath  ef¬ 
fects  appearing  in  frequency  selective  fading  channels  (with 
small  delay-spread).  Then,  in  presence  of  timing  errors,  a 
corresponding  average  output  SINR  can  be  defined  as: 


SINR(wfc)  =  - 


(19) 


R 


•^k 


—  Ee  [Rfi 


Be[R[<+n]J(20) 


Similarly,  a  corresponding  average  MSB  can  be  defined  as: 


MSE(wfc)  =  E,E  [|4M  -  wf  r(m)f 


(21) 


We  propose  the  definitions  (19)  and  (21)  as  measures  of 
performance  in  presence  of  timing  uncertainty. 


4.  ROBUST  DETECTOR 

In  the  absence  of  timing  errors,  the  decorrelating  detec¬ 
tor  (and  the  MMSB  detector  as  0)  completely  null 

out  the  influence  of  all  interfering  signature  waveforms  at 
their  specified  timings  (as  defined  by  their  propagation  de¬ 
lays).  In  the  presence  of  timing  errors,  such  nulling  is  not 
guaranteed.  Even  for  small  errors  the  reduction  in  perfor¬ 
mance  as  measured  by  decrease  in  average  SINR  (19)  or  by 
increase  in  average  MMSB  (21)  can  be  very  significant,  es¬ 
pecially  for  high  near-far  scenarios.  The  effect  of  the  timing 
errors  as  seen  by  averaging  over  the  propagation  delays  cor¬ 
responds  to  a  sort  of  temporal  smearing  of  each  of  the  E 
user’s  signals.  A  detector  which  is  designed  to  be  robust  in 
the  presence  of  timing  errors  should  take  this  smearing  ef¬ 
fect  into  account  in  order  to  create  broad  temporal  nulls  for 
the  interfering  users.  To  this  end,  we  can  define  the  robust 
MSINR  and  robust  MMSB  receivers  as: 

arg  max  SiNR(wO  (22) 

wjt 

Gmaa: 

arg  min  MSB(wfc)  (23) 

w*. 

/3R"^Pjt  Pfc  =  E^E  [r(m)dfc(Tn)] 


(MSINR) 


(MMSE) 


Fourier  Transform  (DFT)  matrix  as: 


1 

"  1 

1 

v/3QA 

1 

M  =  (3QN-l)/2, 

Now,  resorting  to  the  time-shift  property,  the  DFT  of  the 
users’  code  waveforms  can  be  writen  as: 


A(T  +  e)  =  FB(r  +  e)  =  A(T)©V(e) 

A(t)  =  A(0)©V(r)  (24) 

where  ©  denotes  the  element-wise  Schur  product,  and  the 
linear-phase  matrix  V (x)  and  vector  v  (a:)  are  defined  res- 
pectively  as: 

V(x)  =  [v(a;i),v(a;2),---,v(xK)]  (25) 

-i2,rMi/3T  ~j2w(.M-l)x/3T  Jiir 

e  ,  e  > '  *  *  j  ^ 

The  unitary  property  of  F  and  the  linearity  of  the  eiqiecta- 
tion  operator  imply: 

R=JF"[(A  (r)  FA"  (r))©  S©Q  )]FJ"  -f-  a^l 
1 

S  =  ^  v(iT)v"(tT)  (26) 

i=-l 

R,,  =  7fcJF"  [(afc  (0)  af  (0))  ©  Q  (aL.,)]  Fj"  (27) 
~  ^  ~  (26) 

where  as  in  [7]: 

[Q  [v  (^)  (6)]]p= 

The  problem  of  the  exact  propagation  delays  appearing  in 
the  parameterized  computation  of  R  in  (26)  can  be  ad¬ 
dressed  by  simply  using  the  estimated  delays,  ffc,  in  the 
arguments  of  the  V(*)  and  v(*)  in  the  computation  of  A  (t) 
using  (24)  and  (25). 

Thus,  in  summary,  the  new  robust  MSINR  receiver  filter 
is  formed  by  using  (27)  and  (28)  with  estimated  propagation 
delays  in  (22).  The  robust  receiver  (22)  offers,  as  verified 
in  the  next  section,  the  following  compromise  with  respect 
to  the  optimum  receiver  defined  in  (16): 

SINR(woJ  >  SINR(w,.J  «SINR(wrJ  »  SINR(woJ 


v(a:)  =  [i 


5.  RESULTS 


Due  to  the  expectation  over  the  fcth  propagation  delay  er¬ 
ror,  €fc,  the  “signal”  correlation  matrix,  Rs^,  has  lost  the 
rank-one  property  and,  therefore,  the  two  solutions  are  not 
equal  in  this  case:  Wrj, (msinr)  ^  '^rk(MMSE)* 

Finally,  transformation  to  the  Fourier  domain,  where 
a  time-shift  corresponds  to  a  linear  phase,  will  aid  in  for¬ 
mulation  of  simple  closed  form  expressions  for  the  average 
correlation  matrices  (20).  In  particular,  we  define  a  Discrete 


In  this  section,  results  of  computer  simulations  of  the  per¬ 
formance  of  the  new  technique  (22)  and  comparison  with 
that  of  the  ordinary  MMSB/MSINR  receiver  (17)  are  pre¬ 
sented.  Consider  a  iiT  =  3  user  with  Q  =  2  samples  per 
chip  and  N  =  31  chips  per  symbol.  Nyquist  pulses  with 
roll-off  0.5  are  used,  and,  for  convenience  T  =  1.  The  perfor¬ 
mance  of  receiver’s  for  user  =  1  are  consider  with  n  =  0, 
r2  =  T/3,  and  T3  =  -T/3  and  user  powers  71  =  1,  72  =  50, 
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and  73  =  50.  To  gain  insight  into  the  effect  of  the  new 
procedure,  consider  Fig.  1,  the  output  of  the  conventional 
receiver  (designed  for  the  above  scenario)  for  user  one  when 
only  one  symbol  of  user  two  is  present  at  its  input.  The  re¬ 
ceiver  succeeds  in  placing  two  sharp  null  at  times  t  =  0  and 
t  =  T  over  the  interfering  user.  Now,  consider  the  same 
experiment  but  with  the  robust  receiver  of  (22)  as  shown 
in  Fig.  2.  This  time  two  broad  temporal  nulls  are  placed  at 
times  f  =  0  and  t  =  T.  These  broad  nulls  are  what  provides 
the  robustness  to  timing  uncertainty. 

Next,  for  the  above  user  powers  and  noise  power  = 
0.1,  Fig.  3  shows  the  output  SINK  as  defined  in  (19)  for 
the  ordinary  receiver  (assumed  timing  error  variance  zero) 
and  the  robust  receiver  (assumed  timing  error  variance, 
^€max  0.003)  as  the  true  timing  error  variance  is  va¬ 
ried.  The  curves  indicate  that,  even  for  relatively  low  near- 
far  and  low  timing  error  variance,  the  conventional  recei¬ 
ver  is  highly  sensitive  to  uncertainty  in  the  propagation 
delays  while  the  robust  receiver  offers  nearly  constant  per¬ 
formance  with  timing  error  variance.  Lastly,  Fig.  4  shows 
performance  for  the  conventional  and  robust  receivers  as  a 
function  of  the  ratio  of  interferer  power,  72  =  73 ,  to  desired 
user  power,  71.  The  true  timing  error  variance  as  well  as 
that  used  in  the  design  of  the  the  robust  receiver  are  again 
set  to  =  0.003.  The  conventional  receiver  is  far  more 
sensitive  to  interferer  power  than  the  robust  technique. 

6.  CONCLUSIONS 

A  new  method  for  the  design  of  multi-user  detectors  which 
are  robust  in  the  presence  of  propagation  time  estima¬ 
tion  errors  and/or  frequency  selective  multipath  (with  delay 
spread  on  the  order  of  a  few  chips)  has  been  presented.  The 
new  detector  is  near-fax  resistant  and  offers  greatly  impro¬ 
ved  performance  over  the  conventional  detector  for  a  variety 
of  scenarios.  The  technique  is  also  useful  for  robust  detec¬ 
tion  in  cases  where  highly  accurate  timing  estimates  are 
available  but  can  only  be  updated  relatively  infrequently. 
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Figure  1:  Conventional  receiver  one  output  vs.  time 


Figure  2:  Robust  receiver  one  output  vs.  time 


Figure  3:  SINR  vs.  timing  error  var.  (-  conv.,  -  -  rob.) 
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Abstract 

The  use  of  adaptive  arrays  in  a  multi-rate  multi-media 
CDMA  network  is  analyzed,  and  simulation  results  are  pro¬ 
vided  for  a  candidate  two-rate  system.  Our  proposed  ap¬ 
proach  uses  reference  signal-based  adaptation  (LMS)  for 
antenna  weight  control.  Simulation  results  show  adaptive 
arrays  can  significantly  enhance  the  multi-media  services 
that  can  be  provided  by  a  CDMA  network.  In  addition,  pre¬ 
liminary  simulation  results  for  RLS  and  LMS  weight  control 
algorithms  are  presented. 


1.  Introduction 

Due  to  the  growing  user  demands  for  wireless  communi¬ 
cation  services,  there  has  been  much  interest  in  finding  alter¬ 
native  methods  of  increasing  capacities  of  wireless  systems 
beyond  that  achievable  by  today’s  systems.  One  emerg¬ 
ing  concept  that  has  been  receiving  much  attention  is  Space 
Division  Multiple  Access  (SDMA).  In  SDMA,  performance 
improvements  are  realized  by  exploiting  the  spatial  distribu¬ 
tion  of  users  through  spatial  filtering  provided  by  the  use  of 
antenna  arrays  at  the  base  station.  The  application  of  SDMA 
techniques  to  wireless  CDMA  systems  has  been  investi¬ 
gated  in  [3,  5, 7,  8],  and  these  studies  show  that  significant 
increases  in  cellular  system  edacities  are  realizable  using 
antenna  arrays.  These  papers,  however,  focus  on  cellular 
systems  supporting  a  single  user  type,  typically  considered 
to  be  voice. 

Future  wireless  systems,  such  as  PCS,  will  be  required 
to  handle  multi-media  traffic  types  -  voice,  data,  and  video 
-  which  can  have  different  data  rates,  as  well  as,  differ¬ 
ent  quality  of  service  (BER)  requirements.  The  flexibil¬ 
ity  of  CDMA  in  accommodating  multi-rate  users  makes  it 
a  promising  technique  for  future  wireless  communication 
networks.  But,  as  is  the  case  for  any  conventional  CDMA 
system,  capacity  is  limited  by  practical  considerations,  avail¬ 
able  signal  power  and  system  bandwidth.  In  multi-media 
networks  the  capacity  determines  the  quantity  and  diversity 


of  traffic  that  can  be  supported  by  the  network.  In  order 
to  make  multi-media  system  economically  viable  for  large- 
scale  implementations,  methods  of  increasing  capacities  are 
needed.  Therefore,  in  this  paper  we  analyze  the  implemen¬ 
tation  of  adaptive  arrays  in  a  multi-rate  multi-media  CDMA 
network  and  show  their  benefits  in  enhancing  the  multi- 
media  services  that  can  be  provided  by  the  network. 

In  addition,  a  critical  issue  associated  with  the  implemen¬ 
tation  of  adaptive  arrays  is  the  method  for  antenna  weight 
control.  Our  proposed  approach  uses  reference  signal-based 
adaptation  (LMS)  for  antenna  weight  control,  as  opposed  to 
the  "code  filtering"  approach  introduced  in  [7, 4].  In  this  pa¬ 
per,  we  will  take  a  preliminary  look  at  the  reference  signal 
weight  control  algorithms,  LMS  and  RLS,  for  the  multi- 
media  CDMA  network. 

In  section  2,  our  system  model  is  described.  In  section 
3,  the  BER  analysis  for  a  multi-rate  CDMA  system  with 
array  processing  is  presented  along  with  numerical  results. 
In  section  4,  we  present  simulation  results  for  LMS  and  RLS 
antenna  weight  control  algorithms.  And  finally  in  section  5, 
we  conclude  with  some  final  remarks. 

2.  System  Model 

In  this  analysis,  we  consider  a  single-cell  fixed  chip-rate 
multi-media  CDMA  system  which  provides  service  to  S  dif¬ 
ferent  user  types  with  data  rates  given  by  {Ri,R2,  ■  ■  ■,Rs)- 
For  simplicity,  the  type-i  user  will  be  denoted  as  user 
(ijc).  Since  the  system  has  a  fixed  chip-rate  for  all  users, 
each  user  will  have  a  processing  gain  dependent  on  its  data 
rate,  Ni  -  Rc I Ri,  where  Rc  is  the  common  chip  rate. 

The  base  station  consists  of  a  uniformly  spaced  linear 
array  of  M  omnidirectional  antenna  elements.  The  antenna 
spacing  is  assumed  to  be  half-wavelength,  and  time  delays 
due  to  propagation  across  the  array  are  modeled  as  phase 
shifts  <i.e.  narrowband  array  assumption).  Figure  1  shows  a 
block  diagram  of  the  base  station  receiver.  The  generation  of 
the  reference  signal  through  the  feedback  loop  was  originally 
proposed  by  Compton  [1],  and  will  be  discussed  more  in 
section  4. 
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For  this  analysis,  we  will  be  interested  in  only  the  re¬ 
verse  link  (mobile  to  base  station)  performance.  We  assume 
there  exist  direct  paths  between  the  mobiles  and  the  base 
station.  Multipath  is  not  considered,  but  will  be  addressed 
in  a  future  paper.  The  reverse  link  modulation  is  BPSK  with 
coherent  demodulation.  All  type  i  users  are  assumed  power 
controlled,  such  that  the  average  received  signal  power  at 
each  antenna  element  is  Pi.  A  residual  power  control  error 
for  each  user  is  included,  and  is  modeled  as  a  zero-mean 
log-normal  random  variable  with  variance  ads-  The  com- 


Figure  1.  Base  station  receiver  for  desired 
user 


plex  representation  of  the  received  signal  for  user  (i,k)  is 
given  as: 

Xi.k(t)  =  V^10^‘’^/^'>>;ibi,k(t  -  7i,k)Ci.k(t  -  r.,k)ei(“'‘-^^‘)ai,k 

where  is  the  term  representing  the  residual  power 

control  error,  ojc  is  the  common  carrier  frequency,  n^k  is 
the  time  delay  with  respect  to  the  reference  antenna  ele¬ 
ment.  We  assume  without  loss  of  generality  that  n^k  is  a 
random  variable  uniformly  distributed  over  [0,  Ti  =  i /Ri). 

^i,k  =  <Pi,k  +  n,k<^c  is  the  phase  shift  with  respect  to  the 
reference  antenna  element,  and  is  modeled  as  a  random 
variable  uniformly  distributed  over  [0,  2it).  Xi  is  a  binary 
random  variable  representing  the  activity  of  the  type  i  user, 
PAXi  =  1]  =  represents  the  data  waveform 

consisting  of  an  i.i.d.  sequence  of  rectangular  pulses  of 
amplitude  ±1  with  duration  Ti.  Ci^k{t)  represents  the  code 
waveform  consisting  of  a  sequence  of  rectangular  pulses  of 
amplitude  ±1  with  duration  Tc.  And  aj  is  the  antenna 
response  vector  given  by: 


Idir  . 

7i,k  =  —sinPi^k 


where  0i^k  is  the  angular  location  of  user  (i,k)  with  respect 
to  the  broadside  of  the  base  station  antenna  array. 

The  total  received  signal  at  the  antenna  array  is: 


S  K, 


i=l  fc=l 


where  Ki  is  the  number  of  type  i  users  in  the  cell,  and  n(<) 
is  a  complex  gaussian  process  with 

E[n*(t)n^(r)]=a2W(t-r) 

3.  Multi-rate  CDMA  BER  Performance 
3.1.  BER  Analysis 

In  this  section  we  present  our  analytical  approach  for 
evaluating  the  average  BER  for  user  type-s.  The  combin¬ 
ing  of  individual  antenna  element  responses  is  optimal  in 
the  sense  that  it  minimizes  the  squared  error  between  the 
antenna  array  output  and  a  given  reference  signal.  To  con¬ 
duct  our  analysis,  we  consider  an  arbitrary  scenario  which 
consists  of  {Ki  ,K2,...,Ks)  users  whose  angular  locations 
are  given  by  ^  =  (A,i ,  A.2,  ■  •  • ,  /3s,Ks)-  In  addition,  the 
instantaneous  residual  power  control  errors  for  each  user  is 
given  by  s  =  (ei  2, . .  ■,es,Ks)’  where  ei^k  is  the  instanta¬ 
neous  residual  power  control  error  for  user  (i,k). 

Our  desired  user  is  assumed  to  be  (s,p)  for  which  the 
optimal  antenna  weights  are  given  by  the  Wiener  solution: 


R..  =  E[x*(t)xT(t)]  =  E  ka?;k  +  cT^l 

1-1  k=l 

The  output  of  the  adaptive  array  for  our  desired  user  is: 
ys,p(t)  =  Re[wJ_pX(t)] 

S  Ki 

=  EE  V^10^'''‘^“xibi,k(t  -  'n,k)Ci,k(t  -  r,jc) 
1=1  k=l 
M 

E  Wn,cos(a;ct  -  0i,k  -  (m  -  1)7;, ^  -I-  V’n,)  -I-  n'(t) 

m=:l 


where  n*{t)  —  Re[u^,^{t)]  is  a  zero-mean  gaussian  ran¬ 
dom  variable  with  variance 

The  array  output  is  processed  by  a  correlation  receiver 
for  the  user  desired  user.  Assuming  =  Os^p  =  0,  the 
output  of  the  correlation  receiver  is  given  by: 


z 


s,p 


ys,p(t)Cs,p(t)cosa;ctdt 


Using  gaussian  approximations  for  the  other  user  inter¬ 
ference,  and  using  the  results  developed  in  [6],  we  compute 
the  desired  users  SINR  and  BER  for  the  given  scenario  (/?,  e) 
as  "" 


SINRs,p%e)  = 


E[z8^p  {uTs  )  1^,  g] 
at  [zs^p  {tiTs  )  1^,  £] 
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BERsAi>S.)  =  QiSINRii,^)) 

And  we  can  now  obtain  an  average  BER  for  the  type  s  user 
by 

mRs  =  jJ^BERs,p%s.)f{i)f{^)d§d^ 


average  BER  for  both  users  as  we  go  from  1  to  7  antenna 
elements  at  the  base  station.  With  one  antenna  performance 
is  unacceptable,  while  with  five  and  seven  antenna  elements 
we  are  able  to  maintain  BER  <  10“^  for  all  users.  Also, 
note  that  each  user  experiences  a  graceful  degradation  in 
BER  as  we  increase  the  number  of  type  2  users. 


3.2.  Numerical  Results 

The  complexities  associated  with  analyzing  adaptive  ar¬ 
rays  does  not  permit  a  closed  for  expression  for  BER,  there¬ 
fore  in  the  this  subsection  we  Monte  Carlo  simulations  to 
evaluate  average  BER  performance  of  a  candidate  two-rate 
CDMA  system. 

We  consider  a  CDMA  system  providing  service  to  two 
users  types  with  data  rates  Ri  =  9600  bps  and  i?2  =  19200 
bps,  and  associated  processing  gains  Ni  =  128  and  Ni  = 
64.  We  assume  a  single  120°  cell  sector  with  1,3,5,  and  7 
antenna  elements.  The  variance  of  the  residual  power  con¬ 
trol  error  is  fixed  at  .1  dB.  The  number  of  type  1  users,  Ki, 
is  fixed  at  60,  while  the  number  of  type  2  users,  K2,  is  varied 
from  5  to  25.  The  users  are  assumed  uniformly  distributed 
over  the  cell  sector.  The  average  BER  for  both  user  types  is 


Number  of  type  2  users 


Figure  2.  Average  BER  for  user  type  1  and  2 
with  eJ^^VNo  =  eJ^VNo  =  0  dB 
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Number  of  type  2  users 

Figure  3.  Average  BER  for  user  type  1  and  2 
with  =  0  dB  and  eJ^^VNo  =  3  dB 

Next,  we  consider  the  case  when  E^^^ /No  is  kept  at  0 
dB,  and  E^^ /No  is  increased  to  3  dB.  This  corresponds  to 
P2  =  4 Pi.  Figure  3  shows  BER  performance  under  this 
condition.  Here,  user  type  1  BER  performance  has  de¬ 
graded  slightly,  and  degrades  more  rapidly  with  the  addition 
of  type  2  users.  However,  user  type  2  realizes  over  an  order 
of  magnitude  improvement  in  average  BER.  So,  multiple 
users  with  different  qualities  of  service  can  be  supported  by 
appropriately  adjusting  user  signal  powers.  If  we  increase 
the  bit  energies  for  both  users,  while  maintaining  the  same 
ratio  in  their  powers,  we  can  realize  still  further  improve¬ 
ments  in  BER  performance.  As  we  see,  antenna  arrays  offer 
another  "resource",  in  addition  to  user  signal  power  and 
system  bandwidth,  which  can  be  used  to  overcome  limita¬ 
tions  in  capacities,  and  improve  the  multi-media  services  of 
CDMA  networks. 


analyzed  under  various  power  control  approaches. 

First,  we  consider  the  case  of  equal  received  bit  energy- 
to-noise  at  the  antenna  elements.  Eh /No  for  both  users. 
Figure  2  shows  the  average  BER  for  user  type  1  and  2  for 
eI^^ /No  -  Ef^ /No  =  0  dB.  This  corresponds  to  P2  - 
2Pi .  Also,  both  users  are  assumed  to  have  an  activity  factor 
of  1.  From  these  results  we  clearly  see  the  improvements  in 


4.  Antenna  Weight  Control  Algorithm 

In  the  previous  section  we  evaluated  the  performance 
gains  of  a  multi-rate  CDMA  network  when  adaptive  arrays 
are  implemented  at  the  base  station.  These  performance 
gains  were  based  on  optimal  combining  of  antenna  responses 
assuming  optimal  antenna  weights,  and  thus  represent  an 
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upper  bound  on  that  achievable  in  practice.  The  degree  to 
which  we  approach  this  upper  bound  is  dependent  on  the 
performance  of  weight  control  algorithms.  In  this  paper, 
we  propose  an  LMS  array  approach,  whereby  we  assume 
a  given  reference  signal  and  use  the  well-known  LMS  and 
RLS  algorithms  to  generate  the  antenna  weights  recursively. 

A  method  for  generating  a  suitable  reference  signal  for 
spread  spectrum  signals  was  introduced  by  Compton  [2, 1]. 
This  method  is  based  on  the  premise  that  the  combination  of 
despreading,  filtering,  and  respreading,  which  occurs  in  the 
feedback  loop  shown  in  Figure  1 ,  provides  a  reference  signal 
consisting  of  the  desired  user’s  signal,  and  an  interference 
component  which  is  uncorrelated  with  the  received  signal  at 
the  antenna  elements. 

For  this  simulation,  we  assume  the  feedback  loop  op¬ 
erates  ideally,  and  use  a  reference  signal  consisting  of  a 
normalized  version  of  the  desired  user’s  signal  plus  an  in¬ 
terference  component.  The  interference  component  is  mod¬ 
eled  as  a  zero-mean  complex  gaussian  random  variable  with 
variance  C7r,  and  uncorrelated  with  the  received  signal  at  the 
antenna  elements.  Also,  for  this  simulation,  we  assume  both 
user  types  have  an  activity  factor  of  one.  In  future  work,  we 
will  extend  this  to  the  case  of  user  types,  such  as  voice,  with 
activity  factors  less  than  one. 


Figure  4.  Convergence  of  RLS  and  LMS  algo¬ 
rithms  for  user  type  2 


Figures  4  shows  the  results  of  antenna  weight  conver¬ 
gence  for  the  LMS  and  RLS  algorithms  for  user  type  2  given 
Ki  =  60  and  ^^2  =  20.  Similar  results  we  observed  for  user 
type  1 .  The  performance  metric  plotted  is  the  norm-squared 
of  the  difference  between  the  algorithm-computed  antenna 
weights  and  the  optimal  antenna  weights.  Results  shown  are 
averaged  over  100  independent  trials.  From  the  results,  we 


observe  that  both  algorithms  converge  to  the  optimal  antenna 
weights.  RLS  converges  within  10  data  bits,  whereas  LMS 
requires  on  the  order  of  100  data  bits.  In  addition,  the  results 
shown  are  for  a  reference  signal  with  cr^  =  0  (ideal  case),  .1, 
and  1 .  Note,  however,  this  had  no  effect  on  either  algorithms 
performance.  All  cases  are  indistinguishable  from  the  plots. 
This  is  expected,  so  long  as  the  interference  component  of 
the  reference  signal  is  uncorrelated  with  the  received  signal 
at  the  antenna  elements. 

5.  Conclusion 

In  this  p^r,  we  have  shown  that  implementation  of 
adaptive  arrays  are  a  promising  technique  for  enhancing 
the  multi-media  services  of  a  CDMA  network.  These  en¬ 
hancements  include  increased  system  capacities,  as  well  as, 
improved  robustness  in  handling  multiple  users  with  differ¬ 
ent  quality-of-service  requirements.  Future  work  wiU  focus 
on:  (1)  extending  the  BER  analysis  and  weight  control  sim¬ 
ulations  to  include  multipath  fading  (IS-95),  (2)  further  ex¬ 
amination  of  the  reference  signal  generation  for  a  multi-rate 
CDMA  environment,  and  (3)  extending  the  weight  control 
simulations  to  include  users  with  non-unity  activity  factors. 
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ABSTRACT 

Numerous  authors  have  attempted  to  improve  the  per¬ 
formance  of  eigenstructure  methods,  but  all  these  ap¬ 
proaches  do  not  employ  the  additive  information  aris¬ 
ing  when  several  direction  of  arrival  (DO A)  estimation 
algorithms  (referred  to  as  underlying  estimaiors)  are 
used  simultaneously.  In  this  paper,  we  show  that  in¬ 
volving  this  information,  one  can  achieve  much  better 
DOA  estimation  performance  than  that  of  each  under¬ 
lying  estimator  used  separately.  We  introduce  a  Joint 
Estimation  Strategy  (JES)  which  represents  a  simple 
and  effective  way  of  extracting  and  combining  such  in¬ 
formation.  This  strategy  is  then  applied  to  the  set 
of  eigenstructure  underlying  DOA  estimators  includ¬ 
ing  the  MUSIC  and  Generalized  Min-Norm  (GMN)  es¬ 
timators. 

1.  INTRODUCTION 

Eigenstructure  techniques  have  proven  to  be  an  excel¬ 
lent  tool  for  estimating  DOA’s  of  multiple  narrowband 
sources  in  passive  sensor  arrays  [1],  [2].  At  high  sig¬ 
nal  to  noise  ratios  (SNR’s)  and  with  a  large  number  of 
snapshots,  the  eigenstructure  techniques  provide  excel¬ 
lent  estimation  performance  because  the  error  variance 
is  comparable  to  the  Cramer- Rao  bound  (CRB).  How¬ 
ever,  their  performances  become  severely  degraded  at 
low  SNR  and  when  the  number  of  data  snapshots  is 
small.  The  problem  of  improvement  the  performances 
of  eigenstructure  techniques  has  recently  attracted  sig¬ 
nificant  attention.  However,  all  known  approaches  to 
this  problem  do  not  employ  additive  information,  aris¬ 
ing  when  several  underlying  DOA  estimation  methods 
are  used  simultaneously.  In  this  paper,  we  demon¬ 
strate  that  taking  this  information  into  account,  one 
can  achieve  much  better  DOA  estimation  performance 
as  compared  to  the  conventional  case,  when  each  under¬ 
lying  estimator  is  used  separately.  We  propose  a  Joint 
Estimation  Strategy  (JES)  which  represents  a  simple 

This  work  was  supported  by  the  Alexander  von  Humboldt 
Foundation  and  the  SASPARC  Project  of  INTAS. 


and  effective  way  of  extracting  and  combining  such  in¬ 
formation.  This  strategy  is  then  applied  to  the  set  of 
eigenstructure  underlying  DOA  estimators,  namely,  to 
the  MUSIC  estimator  [1]  and  the  family  of  GMN  tech¬ 
niques  [3],  [4]. 

2,  JOINT  ESTIMATION  STRATEGY 

The  central  idea  of  JES  is  resampling.  Several  statis¬ 
tical  resampling  schemes  are  available,  including  the 
bootstrap  scheme  based  on  the  resampling  of  initial 
data.  In  contrary  to  the  bootstrap,  our  strategy  can 
be  interpreted  as  a  resampling  of  spatial  spectrum. 

In  order  to  explain  the  key  idea  of  JES,  let  us  con¬ 
sider  a  trial  including  the  estimation  of  the  covariance 
matrix  using  M  data  snapshots.  The  instantaneous 
performance  of  any  estimation  algorithm  achieved  in 
this  single  trial  (without  any  statistical  averaging)  is 
hereafter  referred  to  as  a  local  behavior  of  this  algo¬ 
rithm.  Due  to  the  fact  that  the  underlying  estimators 
are  functionals  of  the  sample  covariance  matrix  (i.e., 
are  different  random  functions  of  steering  angle),  their 
local  behavior  is  different  in  each  estimation  trial.  For 
example,  considering  two  underlying  estimators  with 
comparable  performance  and  increasing  number  of  tri¬ 
als  tested,  one  can  always  find  some  trials  where  the 
first  estimator  resolves  the  sources  while  the  second  es¬ 
timator  does  not.  In  turn,  some  other  trials  always  ex¬ 
ist,  which  demonstrate  the  reverse  local  behavior  where 
the  first  estimator  does  not  resolve  the  sources  while  the 
second  estimator  does.  This  illustrates  the  intuitively 
clear  fact  that  the  probability  of  source  resolution  at 
least  by  one  estimator  among  the  simultaneously  used 
estimators  is  always  higher  than  that  by  each  underly¬ 
ing  estimator  exploited  separately.  Evidently,  for  the 
extraction  of  the  useful  information  arising  when  sev¬ 
eral  estimators  are  used  simultaneously,  one  should  test 
the  local  behavior  of  each  DOA  estimator  among  the 
full  set  of  underlying  estimators  and  then  to  sort  these 
estimators  into  two  groups:  the  group  of  ‘‘successful” 
estimators  resolving  the  sources  in  this  concrete  trial 
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and  the  group  of  “unsuccessful”  .estimators  which  can¬ 
not  resolve  them.  Then,  only  “successful”  estimators 
should  be  used  in  such  a  way  that  allows  to  combine 
the  results  from  each  of  them. 

Let  us  consider  the  set  of  k  underlying  estimators 
with  spectral  functions  i  =  1,2,...,^  of  angle 

6,  which  are  calculated  in  parallel  using  the  same  data 
snapshots.  Assume  also  that  preliminary  approximate 
estimates  q  and  ©  of  the  number  of  sources  q  and  the 
angular  sectors  of  source  localization  0  are  available. 
Then,  testing  the  following  hypothesis  for  each  under¬ 
lying  estimator  leads  to  the  appropriate  grouping  of  the 
"‘successful”  DO  A  estimators: 

H:  The  function  fi{0)  has  more  than  q-l  separate 
spectral  peaks  localized  m  0.  ^ 

Now,  we  are  ready  to  formulate  JES.  It  includes  the 
following  steps: 

•  Estimate  the  number  of  sources  q  using  one  of  the 
existing  signal  detection  techniques  [5]. 


k  estimators  fi{6),  i  =  1,2,  then  estimate 

the  l-ih  DOA  dj  as: 


1 

m  f— ' 


:(0 


(2) 


where  6^^^  <  ^2^  <  * '  *  <  ^he  ordered  set 

of  angles,  corresponding  to  the  q  main  maxima  of 
the  function  fi{0). 

If  the  hypothesis  7i  is  wrong  for  all  DOA  estima¬ 
tors  from  the  total  number  of  underlying  estima¬ 
tors,  then  estimate  the  /-th  DOA  61  as: 


0i 


E«1 

i=l 


[(0 


(3) 


where  <  6^^  <  ■  ■  ■  <  0^^^  is  the  ordered  set 
of  angles,  corresponding  to  the  q  main  maxima  of 
the  function  fi{0). 


•  Estimate  the  angular  sectors  of  source  localiza¬ 
tion  0.  One  of  the  possible  ways  of  estimating  0 
using  conventional  beamformer  is  [3]: 

©  =  +  adits'"*] 

U  •  •  ■ 

•  •  •  u  [d;r"  -  (1) 

where  /  =  1,  2, . . .  ,p  are  the  coordinates  of 

the  signihcaiit  peaks  of  the  conventional  beam- 
former  output,  p  is  the  total  number  of  signifi¬ 
cant  peaks, 

iDOundaries  of  each  subinterval  of  estimated  an¬ 
gular  sectors  0,  and  a  is  a  positive  coefficient 
close  to  1.  If  the  l-th  peak  has  both  right  and  left 
—3  dB  decrease  levels  then  ^nd  can 

be  chosen  as  angular  distances  between  the  max¬ 
imum  of  the  /-th  peak  and  the  point  of  its  —3 
dB  decrease,  respectively.  If  the  /-th  peak  has 
no  right  or  left  —3  dB  decrease  levels  then  ^ 
and  can  be  chosen  as  the  angular  distances 
between  the  maximum  of  the  /-th  peak  and  the 
corresponding  right  or  left  closest  point  in  which 
the  beamformer  output  transforms  from  the  de¬ 
creasing  to  increasing  function. 

•  Test  the  hypothesis  7i  for  each  DOA  estimator 
from  the  total  number  of  underlying  estimators. 

•  If  the  hypothesis  'H  is  valid  for  m  >  0  estimators 
fi{0),  i  =  1,2,  ...,?n  from  the  total  number  of 


Equation  (2)  corresponds  to  the  so-called  censored  av¬ 
eraging,  using  only  “successful”  estimators.  In  turn,  (3) 
corresponds  to  the  case  when  all  estimators  are  “unsuc¬ 
cessful”  and  we  have  no  reasons  to  prefer  one  estimator 
to  another. 

The  presented  strategy  is  a  universal  approach  be¬ 
cause  it  can  be  applied  to  any  possible  set  of  underly¬ 
ing  estimators  having  angular  spectral  functions.  One 
can  involve  into  JES  the  root  versions  of  estimators, 
too,  using  an  appropriate  transformation  to  the  smooth 
spectral- type  function  [3].  This  strategy  allows  to  han¬ 
dle  coherent  source  scenarios  because  it  enables  any 
preprocessing  within  each  underlying  estimator:  for  ex¬ 
ample,  the  well  known  spatial  smoothing  technique  can 
be  used.  Moreover,  one  can  choose  the  underlying  es¬ 
timators  performing  well  in  coherent  source  environ¬ 
ments. 

3.  APPLICATION  OF  JES  TO 
EIGENSTRUCTURE  DOA  ESTIMATORS 

This  section  describes  the  underlying  eigenstructure 
DOA  estimators  which  can  be  successfully  exploited 
in  JES. 

The  i-th  snapshot  of  the  n  x  1  complex  vector  of 
n-element  array  outputs  is  given  by 

r{i)  =  A5(i)  -b  n(i)  (4) 

where  A  -  [a(^i), . . . ,  a{0g)]  is  the  nxq  matrix  of  the 
source  wavefront  vectors,  a{9)  is  the  n  x  1  wavefront 
vector  corresponding  to  the  direction  0,  s{i)  is  the  qxl 
vector  of  random  source  waveforms,  and  n{i)  is  the 
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?i  X 1  vector  of  sensor  noise.  The  n  x  n  spatial  covariance 
matrix  of  array  outputs  can  be  expressed  as  [1],  [2]: 

R  =  E[r{i)r^(i)]  =  ASA^  +  (5) 

where  S  is  the  g  x  ^  covariance  matrix  of  signal  wave¬ 
forms,  I  is  the  n  X  n  identity  matrix,  E^]  and  H  denote 
the  expectation  operator  and  the  Hermitian  transpose, 
respectively.  The  sample  covariance  matrix  is  given  by 

1  ^ 

1=1 

The  eigendecomposition  of  the  matrix  (6)  can  be  ex¬ 
pressed  as 

n 

(7) 

2  =  1 

where  Af  (Ai  >  A2  >  •  •  •  >  A^)  and  iii  are  the  i-th 
sample  eigenvalue  and  i-th  corresponding  sample  eigen¬ 
vector,  respectively.  The  popular  MUSIC  technique  [1] 
estimates  the  DOA’s  as  locations  of  q  highest  peaks  of 
spectral  function 


/music  {0)U  !<[U  ^a{0^  (8) 


where  U —  [ti^+i ,  Uq-^2  ?  •  •  •  >  ^n]  Is  the  n  x  (n  —  g)  ma¬ 
trix  constructed  with  the  noise-subspace  eigenvectors. 

The  GMN  method  [3],  [4]  represents  the  straight¬ 
forward  extension  of  the  popular  Kumaresan-Tufts  MN 
technique  [2]  and  estimates  the  DOA’s  as  locations  of 
g  highest  peaks  of  spectral  function 


/gmn(^)  — 


a^{6)wi 


(9) 


where  the  n  x  1  vector  Wi  is  obtained  by  solving  the 
following  conditional  minimization  problem: 


H 

miniofif),:,  UgWi=Q,  wf  Si  =  I  (10) 

TO  i 

Here  Us  =  [wi,  Ti2,  •  •  • ,  and  e*  is  the  vector  with 
all  zero  elements  except  for  the  z-th  one  that  is  equal 
to  1.  For  ?’  =  1  the  GMN  estimator  coincides  with  the 
conventional  MN  estimator  [2],  /gmnW  —  /mn(^)* 
The  solution  of  (10)  can  be  written  as  [3] 


/gmn(^)  — 


a^{e)UNU^;^e. 


-  i? 


(11) 


where  the  constant  [eftj^U^eiY  is  ignored  in  the 
numerator  of  (11).  Eqn.  (11)  describes  the  family  of 
GMN  estimators,  /gmn(^)5  =  1,  2, . . .,  n. 

Below,  in  simulations,  we  apply  JES  to  the  set  of  n-f 
1  eigenstructure  underlying  DOA  estimators,  namely. 


to  the  MUSIC  estimator  (8)  and  n  GMN  estimators 
(11).  The  application  of  JES  to  these  eigenstructure 
estimators  only  insignificantly  increases  the  computa¬ 
tional  burden  as  compared  with  the  MUSIC  algorithm. 
For  reduction  the  computational  cost,  the  relationship 
between  the  MUSIC  and  GMN  functions  [3]  as  well  as 
the  fast  algorithms  [6],  [7]  can  be  employed. 

4.  SIMULATION  RESULTS 

In  our  simulations,  we  assume  a  uniform  linear  array 
of  eight  omnidirectional  sensors  with  half-wavelength 
spacing,  and  two  uncorrelated  narrowband  sources  with 
equal  power.  A  total  of  100  statistically  independent 
trials  are  used  to  obtain  each  simulated  point  of  SNR. 
The  number  of  snapshots  taken  in  each  trial  is  M  = 
100.  Three  scenarios  with  different  source  DOA  sepa¬ 
rations  are  considered:  1).  9i  =  O®,02  =  2®,  2).  di  = 
0®,^2  =  4^5  ^nd  3).  6i  =0^,02  =  8®.  Figures  1-3  show 
the  experimental  comparison  of  the  JES-based,  MU¬ 
SIC  and  MN  algorithms  in  terms  of  resolution  prob¬ 
ability  for  the  scenarios  1-3,  respectively.  Figures  4-6 
show  the  experimental  DOA  estimation  RMS  errors  of 
these  algorithms  compared  with  CRB  for  the  scenarios 
1-3,  respectively.  It  follows  from  simulations  that  the 
JES-based  algorithm  noticeably  outperforms  the  MU¬ 
SIC  and  MN  (underlying)  algorithms  both  in  terms  of 
resolution  probability  and  RMS  error. 
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Figure  1:  Experimental  probabilities  of  source  resolu-  Figure  4:  Experimental  RMS  error  of  DOA  estimation 
tion  versus  SNR  for  the  first  scenario.  versus  SNR  for  the  first  scenario. 


Figure  2:  Experimental  probabilities  of  source  resolu-  Figure  5:  Experimental  RMS  error  of  DOA  estimation 

tion  versus  SNR  for  the  second  scenario.  versus  SNR  for  the  second  scenario. 


Figure  3:  Experimental  probabilities  of  source  resolu-  Figure  6;  Experimental  RMS  error  of  DOA  estimation 
tion  versus  SNR  for  the  third  scenario.  versus  SNR  for  the  third  scenario. 
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ABSTRACT 

We  describe  new  methods  on  the  modeling  of  the  ampli¬ 
tude  statistics  of  airborne  radar  clutter  by  means  of  alpha- 
stable  distributions.  We  develop  target  angle  and  Doppler, 
maximum  likelihood-based  estimation  techniques  from  radar 
measurements  retrieved  in  the  presence  of  impulsive  noise 
modeled  as  a  multivariate  isotropic  alpha-stable  random 
process.  We  derive  the  Cramer- Rao  bounds  for  the  ad¬ 
ditive  Cauchy  interference  scenario  to  assess  the  best-case 
estimation  accuracy  which  can  be  achieved.  The  results 
cire  of  great  importance  in  the  study  of  space-time  adaptive 
processing  (STAR)  for  airborne  pulse  Doppler  radar  arrays 
operating  in  impulsive  interference  environments. 

1.  INTRODUCTION 

Future  advanced  airborne  radar  systems  must  be  able  to 
detect,  identify,  and  estimate  the  parameters  of  a  target 
in  severe  interference  backgrounds.  As  a  result,  the  prob¬ 
lem  of  clutter  and  jamming  suppression  has  been  the  fo¬ 
cus  of  considerable  research  in  the  radar  engineering  com¬ 
munity.  It  is  recognized  that  effective  clutter  suppression 
can  be  achieved  only  on  the  basis  of  appropriate  statis¬ 
tical  modeling.  Recently,  experimental  results  have  been 
reported  where  clutter  returns  are  impulsive  in  nature.  In 
addition,  a  statistical  model  of  impulsive  interference  has 
been  proposed,  which  is  based  on  the  theory  of  symmetric 
alpha-stable  (SaS)  random  processes  [1].  The  model  is  of  a 
statistical-physical  nature  cind  has  been  shown  to  arise  un¬ 
der  very  general  assumptions  and  to  describe  a  broad  class 
of  impulsive  interference. 

Until  recently  much  of  the  work  reported  for  radar  sys¬ 
tems  has  concentrated  mostly  on  target  detection  [2].  In 
this  paper,  we  address  the  tcirget  parameter  estimation  prob¬ 
lem  through  the  use  of  radar  array  sensor  data  retrieved  in 
the  presence  of  impulsive  interference.  In  particular,  we  de¬ 
rive  Cramer-Rao  bounds  on  angle  and  Doppler  estimator 
accuracy  for  the  case  of  additive  sub- Gaussian  noise.  Ini¬ 
tially,  we  consider  the  case  of  additive  multivariate  Cauchy 
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the  Office  of  Naval  Research  under  Contract  N00014-92-J-1034. 
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noise,  assuming  knowledge  of  the  underlying  matrix  of  the 
distribution.  The  results  obtained  here  can  be  viewed  as 
generalizations  of  the  work  done  in  [3]  to  the  2-D  frequency 
estimation  problem  in  impulsive  interference  backgrounds. 
In  Section  2,  we  present  some  necessary  preliminciries  on 
a-stable  processes.  In  Section  3,  we  define  the  space-time 
adaptive  processing  (STAR)  problem  for  airborne  radar  and 
we  form  the  maximum  likelihood  fimction.  In  Section  4,  we 
present  the  Cramer-Rao  analysis  and  derive  boimds  on  the 
vciriances  of  the  spatial  and  temporal  frequency  estimates. 
Finally,  in  Section  5,  we  give  some  examples  on  the  joint 
target  angle  and  Doppler  estimation  performance. 

2.  SYMMETRIC  ALPHA-STABLE 
DISTRIBUTIONS 

In  this  section,  we  introduce  the  statistical  model  that  wiD 
be  used  to  describe  the  additive  noise.  The  model  is  based 
on  the  class  of  Complex  Isotropic  SaS  distributions  which 
are  well  suited  for  describing  signals  that  are  impulsive  in 
nature. 

The  symmetric  a-stable  {SaS)  distribution  is  best  de¬ 
fined  by  its  characteristic  function 

V3(w)  =  expUJw  -  tUD  (1) 

where  a  is  the  characteristic  exponent  restricted  to  the  val¬ 
ues  0  <  a  <  2,  J(— oo  <  5  <  oo)  is  the  location  parameter^ 
and  7  is  the  dispersion  of  the  distribution.  The  dispersion 
plays  a  role  cinalogous  to  the  role  that  the  variance  plays 
for  second-order  processes.  The  characteristic  exponent  a 
is  the  most  important  parameter  of  the  SaS  distribution 
and  it  determines  the  shape  of  the  distribution:  the  smcJler 
the  characteristic  exponent  a  is,  the  heavier  the  tails  of  the 
SaS  density. 

SaS  densities  obey  two  important  properties  which  fur¬ 
ther  justify  their  role  in  data  modeling:  the  stability  prop¬ 
erty  and  the  generalized  central  limit  theorem.  Unfortu¬ 
nately,  no  closed  form  expressions  exist  for  the  gener2J  SaS 
probability  density  functions  (pdf)  except  for  the  Cauchy 
and  the  Gaussian  case.  However,  power  series  expansions 
can  be  derived  for  the  general  pdf’s  [l].  Here,  we  are  in¬ 
terested  in  the  family  of  complex  isotropic  SaS  random 
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variables.  A  complex  SaS  random  variable  X  =  Xi  +  jX2 
is  isotropicii  and  only  if  the  bivariate  distribution  (Ai,  X2) 
has  uniform  spectral  measure.  In  this  case,  the  character¬ 
istic  function  of  X  can  be  written  as 


=  exp(i9R[u;X*])  =  exp(-7|a;p).  (2) 

An  important  difference  between  the  Gaussian  and  the 
other  distributions  of  the  SaS  family  is  that  only  moments 
of  order  less  than  a  exist  for  the  non- Gaussian  family  mem¬ 
bers.  If  X  follows  the  isotropic  stable  distribution  with 
dispersion  7,  the  so  called  fractional  lower  order  moments 
(FLOM)  are  given  by 


where 


E\X\^  =  C2{p,a)y-  forO<p<Qf, 

^  2^4-ir(H±i)r(-g) 

^2(p,a)  --  ar(-f) 


(3) 

(4) 


3.  STAP  PROBLEM  FORMULATION  AND 
MAXIMUM  LIKELIHOOD  FUNCTION 

Space-time  adaptive  processing  (STAP)  refers  to  multidi¬ 
mensional  adaptive  algorithms  that  simultaneously  combine 
the  signals  from  the  elements  of  an  array  antenna  and  the 
multiple  pulses  of  a  coherent  radar  waveform,  to  suppress 
interference  and  provide  target  detection  [4,  2,  5]. 

Consider  a  uniformly  spaced  linear  array  radar  antenna 
consisting  of  N  elements,  which  transmits  a  coherent  burst 
of  M  pulses  at  a  constant  pulse  repetition  frequency  (PRF) 
fr  and  over  a  certain  range  of  directions  of  interest.  The 
pulses  repetition  interval  is  Tr.  A  space-time  snapshot 
refers  to  the  iVfiV  x  1  vector  of  samples  corresponding  to 
a  single  range  gate.  Given  a  single  snapshot  containing 
target  at  angle  <t>  and  Doppler  frequency  /,  the  space-time 
snapshot  can  be  written  as  [4] 

X  = /?v((^, /)  +  n  (5) 

where  /3  is  the  target’s  complex  amplitude  given  by 

(3  =  x-\-  jy.  (6) 

The  vector  v  is  an  NM  x  1  vector  called  the  space-time 
steering  vector.  It  may  be  expressed  as 

v(^,/)  =  b(/)^a((^)  (7) 

where  a((/>)  is  the  N  x  \  spatial  steering  vecior  containing  the 
interelement  phase  shifts  for  a  target  at  </>,  and  b(f)  is  the 
A/  X  1  temporal  steering  vector  that  contains  the  interpulse 
phase  shifts  for  a  target  with  Doppler  /.  It  is  assumed  that 
the  functional  form  of  v{(j>,f)  is  known.  In  addition,  we  can 
write 

Vi(4>,f)  =  bf(i)if)-ag{i){(l>)  (8) 

where  Vt(^)  /)  is  the  i-th  element  of  the  space-time  steering 

vector  v(<^,  /),  1  ^  /(O  ^  ^  ~  ~ 

The  snapshot  also  contains  a  noise  component  n.  Here, 
the  noise  includes  clutter,  jamming,  thermal  noise,  and  any 
other  undesired  signals.  As  a  first  approximation  to  the 
problem,  we  assume  that  the  noise  present  at  the  array  is 


statistically  independent  both  along  the  array  sensors  and 
along  time,  and  is  modeled  as  a  complex  isotropic  Cauchy 
process  with  marginal  pdf  given  by 

27r(r2 -f  7^)^/^  ’  ^  ^ 

Under  the  independence  assumption  it  follows  from  (5)  and 
(7)  that  the  joint  density  function  for  the  case  of  a  single 
snapshot  is  given  by  [3] 


/(n)  -  n  (7"  +  ' 

(10) 

In  the  following,  it  will  be  convenient  to  work  with  the 
normalized  spatial  and  temporal  frequency  variables: 

^/>  =  ^^sin</)  ,  w  =  27r/Tr.  (11) 

Ao 

The  estimation  problem  involves  four  real  valued  parame¬ 
ters.  We  arrange  them  to  form  a  4  X  1  parameter  vector 

©  =  [Oi  62  Os  O4]  =  [t/j  w  a;  y].  (12) 

Then,  given  a  single  snapshot  x,  the  likelihood  function 
L(0),  ignoring  the  constant  terms,  is  given  by 

NM 

1,(0)  =  _|^log(7^  +  Ixi-  I3vi{rp,u;)f).  (13) 


4.  CRAMER-RAO  BOUND  ANALYSIS 

The  Cramer-Rao  bound  for  the  error  variance  of  an  unbi- 
ased  estimator  ©  satisfies 

C0-J(©)>O  (14) 

where  GJgj  is  the  covariance  matrix  of  ©  and  >  0  is  inter¬ 
preted  as'meaning  that  the  matrix  is  semidefinite  positive. 
The  matrix  J(©)  is  the  Fisher  information  matrix  given  by 

J(©)  =  E{[dL{S)/dQ][dL{G)/dGf}.  (15) 

First,  we  calculate  the  derivatives  of  the  log-likelihood 
function  given  in  (13)  with  respect  to  the  components  of  O. 
We  have  that 


dtp  +  |n,p 

i=i 

where  df  =  dai  /dtp,  i  =  1,  •  •  ■ ,  A.  In  addition 

dL  _  ^  S?{/3*o*(;)  (d^(i))*n;} 

du)  ~  ^  72  +  lniP 

»=1 

where  dj  =  96; /dw,  i  =  1,  •  •  • ,  M.  AdditionaUy, 
dL  _  -  ■S^ 

7^+|n.P 

t=l 


(16) 


(17) 


(18) 
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dy  ^  7^  +  |n,P  ■  ■  ^ 

By  performing  the  second  derivatives  and  expectations 
in  a  similar  way,  the  Fisher  information  matrix  J(0)  is 
derived  to  be 


MW  II  da  ir 

yMSa 

xMSa 

\Wp 

II  11^ 

yNSb 

xNSb 

yMSa 

yNSb 

MN 

0 

xMSa 

xNSb 

0 

MN 

N 

M 

MN 

s,  =  Y,\d^  p  =  EKc)IK(.)I. 


and  da  =  [di  ♦  •  •  d®],  db  =  [dj  ■  •  •  d^].  Since  target  ^lngle 
and  Doppler  are  the  two  parameters  of  primary  interest, 
we  shall  focus  on  the  upper  left  2x2  block  of  the  Fisher 
information  matrix  J2X2.  The  inverse  of  matrix  J2X2  is  ob¬ 
tained  by  applying  the  partitioned  matrix  inversion  lemma. 
The  result  is 


j-i  =  i  ^ 

2x2(  )  ^  3|^|2' 


■  N{\\  d, 
Sa 


^  ir  -^si) 


SaSb  ■“  p 

M{\\  da  r-yi) 


where  ^  =  (M  ||  da  |p  -fSl)iN  ||  d,  1^  -§Sl)  ^  (SJb  - 
p)  .  The  Cramer- Rao  bounds  of  the  resulting  spatial  and 
temporal  frequency  estimates  are  obtained  from  (20)  as 


CRB(^P) 


5MII  db  IP  -Sl/M) 

3^ 


CRB{u;)  = 


5M(||  dg  f  -Sl/N) 

3^ 


Finally,  by  using  (11),  we  get 
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Figme  1:  MLG  (top)  and  MLC  (bottom)  angle-Doppler 
spectra  {N  =  S,  M  =  4,  4>  =  -10^, /T^  =  0.1).  Additive 
Gaussian  noise  (a  =  2,  7  =  20,  GSNR  =  4  dB). 


In  this  case,  it  follows  from  (21)  and  (22)  that 

and 

CRB{w)  =  —  •  ^2yv2(M2  -!)• 

5.  SIMULATION  RESULTS 


C^BW  =  CRBW(^^)  (23) 

and 

CRB{f)  =  CRB{u,)  ■  (24) 

A  useful  insight  on  the  CRB  can  be  gained  if  we  consider 
the  case  of  linear  array  whose  sensors  are  spaced  a  half¬ 
wavelength  apart,  and  a  waveform  with  an  uniform  pulse 
repetition  interval.  The  spatial  and  temporcil  steering  vec¬ 
tors  for  such  system  are: 

11  [11 

-—jtb  — .70; 

aW  =  .  ,  b(w)  =  .  .  (25) 


In  this  simulation  experiment,  we  test  the  robustness  of 
the  maximum  likelihood  estimator  based  on  the  Cauchy 
assumption  (MLC).  We  assume  a  linear  array  with  iV  =  5 
elements  that  transmits  a  coherent  burst  of  M  =  4  pulses. 
We  considered  a  single  target  located  at  <^  =  10®  and  having 
Doppler  such  that  fTr  =  a;/27r  =  0.1.  Since  the  alpha- 
stable  family  determines  processes  with  infinite  variance  for 
a  <  2,  we  define  an  alternative  signal-to-noise  ratio  (SNR). 
Namely,  we  define  the  GeneralizedrS^B.  (GSNR)  to  be  the 
ratio  of  the  signal  power  over  the  noise  dispersion  7: 

GSNi?  =  10  log  (28) 

In  Figures  1  and  2  we  plot  isosurfaces  of  space- time  spec¬ 
tral  estimates  (likelihood  functions)  for  the  maximum  like¬ 
lihood  estimator  based  on  the  Gaussian  assumption  (MLG) 
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Figure  2:  MLG  (top)  and  MLC  (bottom)  angle-Doppler 
spectra  (iV  =  5,  M  =  4,  </>  =  -10^, /Tr  =  0.1).  Additive 
stable  noise  (a  =  1.5,  7  =  20,  GSNR  =  4  dB). 


[2]  and  for  the  maximum  likelihood  estimator  based  on 
the  Cauchy  assumption  (MLC). The  likelihood  functions  are 
formed  by  using  50  space- time  snapshots.  In  Figure  1,  since 
the  additive  noise  to  the  sensors  is  Gaussian  (a  =  2),  the 
MLG  likelihood  function  is  based  on  the  correct  assump¬ 
tion  about  the  noise  distribution.  On  the  other  hand,  in 
Figure  2,  the  additive  noise  to  the  sensors  is  a-stable  with 
a  =  1.5  and  neither  the  MLG  nor  the  MLC  likelihood  func¬ 
tions  rely  on  the  correct  assumption  about  the  noise  distri¬ 
bution.  As  we  can  see  from  the  figures,  the  MLC  likelihood 
function,  based  on  the  Cauchy  assumption,  attains  its  max¬ 
imum  value  very  close  to  the  true  angle  and  Doppler  values 
in  both  cases  of  additive  stable  noise.  On  the  other  hand, 
the  MLG  likelihood  fimction,  baised  on  the  Gaussian  as¬ 
sumption,  cannot  localize  the  target  accurately  when  the 
actual  data  distribution  deviates  from  the  Gaussian  case. 

The  observed  robustness  of  the  MLC  method  is  quanti¬ 
fied  in  Figure  3  which  shows  the  resulting  mean-square  error 
curve  on  the  estimated  Doppler  as  function  of  the  charac¬ 
teristic  exponent  ex  of  the  additive  noise.  The  results  are 
based  on  300  Monte  Carlo  runs.  As  we  can  clearly  see,  the 
Cauchy  beamformer  is  practically  insensitive  to  the  changes 
of  a.  On  the  other  hand,  the  MLG  algorithm  exhibits  very 
large  mean-square  estimation  error  for  non- Gaussian  noise 
environments. 


Figure  3:  MSE  of  the  estimated  Doppler  as  a  function  of 
the  characteristic  exponent  a. 

6.  CONCLUSIONS 

We  considered  the  problem  of  target  angle  and  Doppler 
estimation  with  an  airborne  radar  employing  space-time 
adaptive  processing.  We  derived  Cramer-Rao  bounds  on 
angle  and  Doppler  estimator  accuracy  for  the  case  of  ad¬ 
ditive  multivariate  Cauchy  interference  of  known  diagonal 
underlying  matrix.  The  boimds  are  fimctions  of  a  gener¬ 
alized  SNR  fxmetion,  similarly  to  the  Gaussian  case  where 
the  bounds  are  functions  of  the  SNR.  As  shown  in  (21)  and 
(22),  target  angle  accin-acy  is  a  function  of  Doppler  fre¬ 
quency  and  vice-versa.  In  addition,  we  introduced  a  new 
joint  spatial-  and  Doppler-  frequency  estimation  technique 
based  on  the  maximum  likelihood  Cauchy  function  (MLC) 
and  we  showed  that  the  Cauchy  estimator  gives  better  re¬ 
sults  in  a  wide  range  of  impulsive  noise  (clutter,  jamming, 
thermal)  environments. 
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Abstract 

Most  existing  array  processing  techniques  for  estimat¬ 
ing  the  directions  of  arrival  or  signal  copy  rely  heavily  on 
the  plane-wave  assumption  offar-field  sources.  When  the 
sources  are  located  relatively  close  to  the  array,  these  tech¬ 
niques  may  no  longer  perform  satisfactorily.  In  this  pa¬ 
per,  we  present  an  asymptotic  performance  analysis  of  a  re¬ 
cently  proposed  ESPRIT-like  method  for  passive  localiza¬ 
tion  of  near-field  sources.  The  algorithm,  based  on  fourth- 
order  cumulants,  is  formulated  for  observations  collected 
from  a  single  uniformly  spaced  linear  array.  We  examine 
the  least-squares  version  of  the  algorithm  and  derive  the  ex¬ 
pressions  for  the  asymptotic  variances  of  the  estimated  di¬ 
rections  of  arrival  and  estimated  ranges  of  the  sources. 


1  Introduction 


Most  array  processing  methods  which  estimate  the  di¬ 
rections  of  arrival  of  sources  make  the  assumption  that  the 
sources  are  located  relatively  far  from  the  array,  so  that 
the  waves  emitted  by  the  sources  can  be  considered  plane 
waves.  However,  when  a  source  is  located  close  to  the  array 
{i.e.,  near-field),  the  plane  wave  assumption  may  no  longer 
be  valid  and  the  wavefront  must  be  characterized  by  both  the 
azimuth  and  range.  Methods  based  on  the  far-field  assump¬ 
tion  are  not  applicable  to  this  situation.  The  near-field  sit¬ 
uation  can  occur,  for  example,  in  sonar,  electronic  surveil¬ 
lance,  and  seismic  exploration. 

In  narrowband  array  processing,  several  variants  of  the 
ESPRIT  algorithm  using  higher-order  statistics  have  been 

^  This  work  was  supported  by  the  Office  of  Naval  Research  under  con¬ 
tract  No.  N00014-95-1-0912. 


presented.  Recently,  Challa  and  Shamsunder  [1]  devel¬ 
oped  a  Total  Least  Squares  ESPRIT-like  algorithm,  based 
on  fourth-order  cumulants,  for  estimating  the  azimuth  and 
range  of  near-field  sources  impinging  on  a  uniformly  spaced 
linear  array. 

In  this  paper,  we  derive  asymptotic  expressions  for  the 
variances  of  estimates  of  the  azimuth  and  range  parameters 
using  the  higher-order  ESPRIT-like  algorithm  of  Challa  and 
Shamsunder  [1].  While  Challa  and  Shamsunder  formulated 
a  total  least  squares  algorithm,  we  give  expressions  based 
on  a  least  squares  version  of  the  algorithm.  However,  it  has 
been  shown  by  Rao  and  Hari  [2]  that  the  asymptotic  vari¬ 
ances  for  these  two  versions  are  the  same.  Some  of  the  ex¬ 
pressions  derived  in  this  paper  are  based  on  the  work  pre¬ 
sented  in  [4]. 

2  Problem  Formulation 


We  use  the  narrowband  model  for  array  processing  of 
near-field  sources  [1].  The  output  of  the  sensor  of  the 
uniformly  spaced  linear  array  is  given  by 

N 

Xm{t)  =  +  rimit),  (1) 

i  =  l 

form  =  .  j  0, 1, , , . ,  TVa-  {Le.,  there  are  27Va;  sen¬ 

sors).  The  array  is  shown  in  Figure  1,  In  matrix  form.  Equa¬ 
tion  (1)  can  be  written  as 

x{t)  =  Bs{t)  -f  n{t)  (2) 

where  the  (m,  n)  element  of  B  is  given  by 

Bmn  =  (3) 


0-8186-7576-4/96  $5.00  ©  1996  IEEE 
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source 


Figure  1.  The  uniformly  spaced  linear  array 
con  guration 


can  compute  the  cumulant  matrices  €2,03  and  C4,  as  ex¬ 
plained  in  [1].  Combining  these  matrices,  one  can  form 


C  = 


where 


The  eigenvectors 


Cl 

C4 

C2 

c? 

Cl 

C3 

Cf 

C? 

Cl , 

[A^  ® 

[ei 

eN 

=  AC4,A 


H 


Ej, 

Ey 

E. 


(7) 

(8) 

(9) 


corresponding  to  the  N  nonzero  eigenvalues  of  C  can  be 
shown  to  yield 

Ey  =  Ea;’®  and  E^=ExT  (10) 


The  parameters  Un  ^re  functions  of  the  azimuth  6n 

and  range  r„  of  the  n*'*  source: 


=:-27r^sin0„  and  <j>n 


TT-^— cos^0„  (4) 
Lrn 


where  L  is  the  wavelength  of  the  source  wavefronts  and  A  is 
the  separation  between  adjacent  sensors.  The  goal  is  to  esti¬ 
mate  the  parameters  {9i, . . .  ■  ■  ■ ,  rd}  given  the  array 

data  x(t)  for  0  <  t  <  As . 


where 

®  =  and  T  =  (11) 

for  some  invertible  matrix  T.  Hence  the  eigenvalues  of  ® 
and  T  allow  one  to  compute  the  azimuth  and  range  param¬ 
eters.  Furthermore,  ®  and  T  can  be  computed,  in  a  least 
squares  sense,  using 

®  =  E^  Ey  and  T  =  Ef  E^,  (12) 

respectively,  where  #  denotes  the  pseudo-inverse  of  a  ma¬ 
trix. 


2.1  ESPRIT-like  Algorithm 


3  Asymptotic  Performance  Analysis 


In  this  section,  we  summarize  the  higher-order  ESPRIT- 
like  algorithm  proposed  in  [1].  Assuming  that  the  source 
signals  are  zero-mean,  non-Gaussian,  statistically  indepen¬ 
dent,  and  stationary,  one  can  show  that  the  matrix  whose 
(m,  n)  element  is 

Ci(m,  n) 

=  cum{xj^(t),  Xm+l(t)i  ®n+l(^)> 

=  (5) 

i=l 

for  0  <  m,  n  <  As  -  1  is  given  by 

Cl  =  AC4,A^  (6) 

which  has  dimensions  A®  x  A^.  The  kurtosis  of  the 
source  is  C4jj.  Sirmlarly,  using  different  sensor  lags,  one 


In  this  section,  we  derive  the  asymptotic  variances  of  the 
estimated  azimuth  and  range  parameters  for  the  higher-order 
ESPRIT-like  algorithm  of  the  previous  section.  The  analy¬ 
sis  is  similar  to  that  given  in  [4],  where  only  the  estimated 
azimuth  parameter  is  analyzed. 

Azimuth 

For  azimuth  estimation,  the  quantity  of  interest  is  9i  (as¬ 
sumed  to  be  given  in  degrees),  which  is  related  to  A,-  by 

A,- =  (13) 

The  separation  between  adjacent  sensors  is  A  and  the  wave¬ 
length  of  the  impinging  wavefronts  is  L.  Using  a  first-order 
Taylor  series  expansion,  we  have 

CO _ ^ _  (14) 

''’-i4xt(cos(fi))A,- 
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Skipping  some  algebraic  steps,  we  get 


HSOi  •  SOi) 

=  i( _ L _ 

2  47rAcos(f^) 

2l 


-real{E{(a,)^}(An^}). 


(15) 


Range 

For  range  estimation,  the  quantity  of  interest  is  r* ,  which 
is  related  to  by 


727rj— cos  (i|5-) 


(16) 


Using  a  first-order  Taylor  series  expansion,  we  have 


5r,' 


hi  +  ^sm(f^)7i( 


Ai 


-i27r;^cos2(f^)7,- 


(17) 


After  some  algebra,  the  variance  of  the  estimated  range  pa¬ 
rameter  is 


E{6ri  -  Sri} 

1 

2(2x^cos2(f^))2 
•[E{|^7iP}  -  real{(7*)2E{(57,)2} 

■real{27;AiE{a*67j  -  27* A* E{ 5 A.- 57,.}} 

+A?E{(a*)2}-2E{|a,f}]  (18) 

where  the  quantities  E{|(57,  p},  E{6AJ‘(57j  },  E{(57j^},  and 
E{<5Aj<57j  },  which  are  the  covariances  of  eigenvalues,  are 
derived  next. 

3.1  Covariance  of  Eigenvalues 


In  LS-ESPRIT,  we  compute  two  matrices  ^  and  T  us¬ 
ing  Equation  (12)  and  then  perform  an  eigendecomposition 
to  get  their  associated  eigenvalues.  Let  Aj  be  an  eigenvalue 
of  T ,  Vj  be  the  corresponding  eigenvector,  and  q^-  be  the  cor¬ 
responding  left  eigenvector,  such  that 

Yvj  =  XiVi 

=  A^q,.  (19) 

Furthermore,  the  left  and  right  eigenvectors  can  be  chosen 
to  be  orthonormal,  so  that 

A*  =  q^Tvi.  (20) 


Under  most  circumstances,  the  matrix  Y  has  to  be  estimated 
using  finite  data.  An  error  5  Y  =  Y  — Y  in  estimating  Y  will 
cause  an  error  5  Aj  =  Aj—Ai.  As  a  first-order  approximation, 
the  error  6X  can  be  shown  to  be 

^Ai  =  q^5Yvi.  (21) 

It  follows  from  Equation  (10)  that  we  have  the  first-order  ap¬ 
proximations 

(E^  -f  SE^){r  +  6r)  6E,,  (22) 

E^sr  ^  (5E,  -  6E^r,  (23) 

and 

Sr  »  Ef  (5E,  -  Ef  6E^T.  (24) 

Using  (24)  in  (21)  and  noting  that  |A,  p  =  1,  we  get 

6Xi  =  q,.Ef(5E,v.-5E*$v.) 

=  -Aiq,Ef(Wi-A*W3)5E,Vi,  (25) 

where 

1  =  [  Ojji  0^  ]  ,  (26) 

W3  =  [  0,„  ]  .  (27) 

and 

^E,  =E,  ^E,.  (28) 

The  matrix  E^  is  defined  in  (9),  Im  is  the  m  x  m  identity 
matrix  and  0^  is  the  m  x  m  zero  matrix. 

Similarly,  let  p^-  and  b,  be  the  left  and  right  eigenvectors 
of  the  matrix  respectively,  and  ji  be  the  corresponding 
eigenvalue,  so  that 


7i=p,*b,.  (29) 

The  error  is  then  given  by 

6j,  =  p,.Ef  (W2  -  7iWi)5E,bi,  (30) 

where  Wi  is  defined  in  (26)  and  W2  is 

W2  =  [  ]  .  (31) 

The  variance  of  A,*  is  thus,  to  the  first-order, 

Eiia^n 

d  d 

=  q,-Ef(W3  -  XiW,)[J2  E 

g=lh-l 

E{Ss,Ss^}]  ■  (W3  -  AiWi)^(q.Ef  )'^,  (32) 


where  6sg  is  the  column  of  the  matrix  6Es  and  Vi^g  is  the 

element  of  the  vector  v* .  The  quantities  6sh  and  Vi^h  are 
similarly  defined. 
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Following  the  above  steps  in  an  analogous  manner,  we 
can  also  derive  the  following  first-order  expressions. 

E{(ai)2} 

d  d 

=  q^Ef  (W3  -  A.-Wi)E  Yj 

g-\  hzzl 

E{Ssg6sl}]iW3  -  AiWif  (q,Ef  f ,  (33) 
E{6Ai67n 

d  d 

=  q,Ef(W3-AiWi)EE^*.^^*> 

g=l  h=:l 

E{5sj6sf  }](W2  -  7,Wi)"(PiEf  )^,  (34) 

d  d 

=  p.E#(W2  -  7iWi)[E  E 

g=lh=l 

E{6s,5sf  }](W2  -  7.Wi)"(PiE#)^,  (35) 


where  ct*  is  an  eigenvalue  of  the  matrix  C  and  Sg^ai  is  the 
element  of  s^,  which  is  an  eigenvector  of  C.  The  nota¬ 
tion  (Omn  refers  to  the  (m,  n)  element  of  a  matrix.  We  note 
that  this  expression  is  greatly  simplified  when  the  signals  are 
Gaussian,  as  is  assumed  in  [2]. 

Furthermore,  the  unconjugated  covariance  of  the  sample 
eigenvectors  and  is,  to  the  first-order, 

E{6sgSsl} 

3Nx  3Nx  3iNrx  SNx  SAT^ 


i/g  n^h  01=1  02  =  1  6i  =  l  62  =  ! 
1=1  n=l 


^n,bi^h,b2  •E{(C  —  0)0102(6  —  0)5162} 

_ .  (39) 

(ag  -  «;)(“'•  “  “«) 

The  asymptotic  covariance  of  sample  fourth-order  cumu- 
lants,  denoted  by  E{(C  -  0)0102(6  -  0)^^jJ  and  by 
E{(6  -  0)0102(6  -  0)6162}  are  derived  in  [5]. 

4  Conclusion 


E{(57.)'} 

d  d 

=  p.Ef  (W2  -  7.Wi)E  E 

^=1  h=l 

E{5s,5sn](W2  -  7.Wi)'"(PiEf )"",  (36) 
E{ai57.} 

d  d 

=  q.Ef(W3-AiWi)EE’'‘-«^*> 

g=lh=l 

E{6sgSsl}](W2  -  7.Wi)"’(PiEf)^.  (37) 

These  equations  depend  on  the  covariances  of  the  eigen¬ 
vectors,  E{6sp(5s^}  and  E{6sgSsl},  of  the  sample  cumu- 
lant  matrix  6 .  In  the  following  sections  we  derive  these  co- 
variances. 

3.2  Covariance  of  Eigenvectors 


By  using  the  first-order  Taylor  series  expansion  of  eigen¬ 
vectors  of  a  matrix  [3],  we  can  show  that  the  covariance  ma¬ 
trix  of  the  signal  eigenvectors  Sg  and  is,  to  the  first-order. 


E{6sgSs^} 

SNt  3Ns  SNx  SNx  SATx  SATx 

=  EE  EE  EE 

I^g  ai  =  l  02  =  1  &1  =  1  &2  — 1 

1=1  n=l 

Sn,bA.b,  •E{(6  -  C)ai„2(6  -  C):^62} 


{og  —  ai){ah  —  On) 


In  this  paper  we  derived  expressions  for  the  asymptotic 
variances  of  azimuth  and  range  estimates  for  the  higher- 
order  ESPRIT-like  algorithm  proposed  by  Challa  and  Sham- 
sunder.  The  formulas  derived  can  be  used  to  evaluate  the 
performance  of  the  algorithm. 
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Abstract 

This  paper  considers  the  problem  of  maximum 
likelihood  (ML)  estimation  for  reduced-rank  linear 
regression  equations  with  noise  of  arbitrary  covari¬ 
ance,  An  explicit  expression  for  the  ML  estimate 
of  the  regression  matrix  is  derived,  A  generalized 
likelihood  ratio  (GLRT)  test  is  also  proposed,  for 
estimating  the  rank  of  the  regression  matrix.  Com¬ 
puter  simulations  and  numerical  examples  indicate 
the  superiority  of  the  proposed  estimator,  as  com¬ 
pared  to  a  traditional  least-squares  approch  that 
does  not  exploit  the  reduced  rank  property  in  an 
optimal  way. 

1  Introduction  and  Preliminaries 

The  focus  of  the  present  paper  is  on  multivariate 
linear  regression  models  of  the  following  form: 

y{t)^4)x{t)  +  e{t),  t  =  (1) 

where  y{t)  G  denotes  the  noise- 

obscured  output  (or  explained)  variables;  x{t)  G 
G  R^^^  is  the  vector  of  input  (or  explanatory) 
variables;  e{t)  G  R”^^^  denotes  the  equation  noise; 
and  <j)  G  is  the  matrix  of  regression  coeffi¬ 

cients,  or  the  parameter  matrix  for  short.  The  fol¬ 
lowing  assumptions  on  (1)  are  considered  to  hold 
throughout  the  paper: 
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A1  The  noise  is  temporally  white,  i.e. 

E[e{t)e'^ (s)]  =  0  for  t  A  s  (2) 

and  normally  distributed  with  zero  mean  and 
unknown  covariance  matrix, 

Q  ^  E[e{t)e^{t)]-  \Q\  +  0  (3) 

(Hereafter,  E  stands  for  statistical  expecta¬ 
tion,  and  I  •  I  denotes  the  determinant  func¬ 
tion). 

A2  The  explanatory  variables  x{t)  are  determin¬ 
istic  signals,  which  are  such  that 

1  ^ 

~N  x{t)x  (t)  =  Rxx\  l-Rxxl  7^  0  (4) 

=  0  (5) 

(the  second  equality  above  holds  with  proba¬ 
bility  one) . 

A3  The  regression  matrix  (f)  may  be  rank  defi¬ 
cient, 

rank{(f))  =  n\  n  <  h  ^  min(p,  m)  (6) 

but  n  is  unknown  {(j)  itself  is  also  unknown,  of 
course). 

The  equation  (1),  along  with  the  previous  assump¬ 
tions,  define  a  rank-reduced  multivariate  linear  re¬ 
gression  model  with  quasi- stationary  deterministic 
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inputs  and  random  white  normal  noise  of  arbitrary 
covariance.  The  practical  significance  of  reduced- 
rank  regression  modelling  is  discussed,  for  exam¬ 
ple,  in  [1,  3].  For  instance,  in  large  econometric 
models,  several  (noise-free)  equations  may  be  lin¬ 
early  related  to  one  another,  which  renders  (p  rank 
deficient.  An  essentially  equivalent  situation  ap¬ 
pears  when  only  a  low-dimensional  linear  trans¬ 
formation  of  the  explanatory  variables  suffices  to 
describe  the  model  outputs.  Rank-reduced  regres¬ 
sion  methods  for  certain  signal  processing  prob¬ 
lems  are  also  discussed  in  e.g.  [4,  7].  However,  the 
studied  problems  in  the  latter  contributions  are 
somewhat  different  (the  regressor  (p  is  not  explic¬ 
itly  modeled  as  rank-deficient)  and  the  proposed 
methods  are  more  or  less  ad-hoc  from  a  statistical 
point  of  view.  Another  application  of  the  reduced- 
rank  regression  occurs  in  state-space  modelling  of 
linear  dynamic  systems  [2].  More  exactly,  it  was 
shown  in  [2]  that  the  estimation  of  the  observ¬ 
ability  matrix  (and  then  of  the  state-space  equa¬ 
tion  parameters)  associated  with  a  linear  system, 
by  using  subspace-based  methods,  is  basically  a 
reduced-rank  linear  regression  problem  as  defined 
herein.  The  latter  problem  is  also  closely  related 
to  canonical  correlation  and  factor  analysis,  and 
as  such  it  is  relevant  to  array  signal  processing 
applications.  In  fact,  the  estimation  of  the  rank 
of  a  cross-covariance  matrix  from  its  sample  ver¬ 
sion  can  be  formulated  as  a  reduced-rank  linear 
regression  problem.  Note  that  the  former  estima¬ 
tion  problem  occurs  in  several  signal  processing 
and  time  series  applications,  including  number  of 
sources  detection  in  sensor  array  signal  processing 
(see  e.g.  [8]). 


The  distinctive  feature  of  the  above  model  is  the 
reduced  rank  of  (p.  If  n  =  rank{(p)  were  equal  to  n, 
then  the  equation  (1)  would  be  a  standard  linear 
regression,  the  parameter  estimation  of  which  is 
well  documented  in  the  literature  (see,  e.g.,  [1,  3, 
5]).  When  n  <  n  (as  stated  in  A3),  the  estimation 
of  the  parameters  in  (1)  is  a  more  complicated 
problem  which  has  not  received  enough  attention 
in  the  literature. 


2  Main  Results 

Let  the  available  observations  be 

{y(l),  x(l), . . . ,  y{N),  x{N)}  ,  N>m  +  p. 

Under  assumption  Al,  the  negative  log-likelihood 
function  of  the  observed  data  is  given  by  (to  within 
a  constant) 

L  ~  f  ^In  IQI  +  tr  E  [y{t)  -  Mt)] 

X  [y(*) ) 

where  tr(-)  is  the  trace  operator.  In  view  of  A3,  the 
ML  estimates  of  ^  and  Q  are  obtained  by  solving 
the  following  problem, 

min  L{Q,<p) 

Q',<p 

under  the  constraint  rank(^)  =  n.  The  con¬ 
strained  optimization  problem  above  can  be  trans¬ 
formed  into  an  unconstrained  one  by  parameter¬ 
izing  <p  as 

<P  =  AB'^  (7) 

where  both  A  G  and  B  G  RP^’"  are  full 

rank  matrices, 

rank{A)  =  rank{B)  =  n 

The  factorization  in  (7),  of  course,  is  not  unique. 
This  fact  complicates  the  analysis  that  follows  to  a 
certain  degree,  but  the  difficulties  induced  by  the 
non-uniqueness  of  the  parameterization  of  (p  can 
be  overcome. 

Introduce  the  sample  covariance  matrix 

Rxx^  ^Y^x{t)x'^{t)  (8) 

t=i 

and  similarly  for  the  sample  covariances  Ryx  and 
Ryy.  Assuming  n  to  be  known,  the  exact  ML 
estimate  of  obtained  by  explicitly  minimizing 
L{Q,  (p),  is  given  by 

^ML  =  ,  (9) 
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where  the  columns  of  the  p  x  n-matrix  S  are  the 
n  principal  eigenvectors  of  the  matrix  W,  given  as 

W  =  .  (10) 

The  noise  covariance  estimate  is  obtained  by  in¬ 
serting  ^  into  the  expression 

1  TV  ^  .  T 

Q  =  —  '^  L(^)  _  L(^)  _ 

t=l 

For  a  proof  see  [6],  where  also  the  asymptotic 
properties  of  the  ML  estimate  are  derived.  Note 
that  the  eigenvalues  of  W  are  the  so-called  canon¬ 
ical  correlations^  and  the  linearly  transformed  ex¬ 
planatory  variables  ^x{t)  are  the  canonical  vari¬ 
ates  [1,  3]. 

In  the  more  interesting  case  where  n  is  not 
known,  a  generalized  likelihood-ratio  test  (GLRT) 
can  be  performed.  Let  n  be  a  candidate  rank  of 
(f>  to  be  evaluated.  The  proposed  procedure  is  for¬ 
mulated  as  testing 

Ho  :  h  =  n 


3  Numerical  Examples 

The  full  version  of  this  paper  also  presents  a  nu¬ 
merically  reliable  implementation  of  the  ML-based 
detection/estimation  scheme.  Assuming  N 
(m+p),  as  would  typically  be  the  case,  the  bulk  of 
the  implementation  is  the  same  QR-factorization 
used  for  solving  the  ordinary  LS-problem.  Thus, 
the  only  significant  complexity  increase  of  the  ex¬ 
act  ML  method  is  due  to  the  need  for  determining 
n. 

In  the  computer  simulations  presented  below, 
an  arbitrarily  selected  (f>  of  dimensions  m  = 
10,  p  =  20  and  of  rank  n  =  5  is  used.  The  re¬ 
gression  matrix  is  scaled  such  that  ||^||f  =  1,  and 
then  fixed  throughout  the  simulation  study.  The 
exact  ML  estimate  is  compared  to  the  ordinary  LS 
estimate,  as  well  as  the  same  estimate  truncated 
to  rank  n,  using  the  singular  value  decomposition. 
The  signal  x{t)  and  the  noise  e{t)  are  both  gen¬ 
erated  as  zero-mean  white  Gaussian  random  pro¬ 
cesses.  The  covariance  matrix  of  the  signal  is  fixed 
at  Rxx  =  whereas  the  noise  covariance  matrix 
is  given  by 


against  the  opposing  hypothesis  that  n  =  n,  where 
n  =  min{m,p}  is  the  maximum  possible  rank  of 
(f>.  The  GLRT  statistic  for  this  test  is  given  by 


Cn  =  -y  S  ln(l-AA:)  , 

A:=n+1 


where  denote  the  eigenvalues  of  W  in  non¬ 
increasing  order.  Under  the  null  hypothesis  Hq^ 
the  GLRT  variable  2  is  shown  to  have  an  asymp¬ 
totic  chi-squared  distribution  with  (m  —  n)(p  —  n) 
degrees  of  freedom, 

2CTi‘^‘ A'^[(m-n)(p-n)]  . 


The  proposed  procedure  is  now  to  test  Ho  for  in¬ 
creasing  values  of  n  (starting  at  n  =  1  or  any  a  pri¬ 
ori  known  lower  bound)  until  the  hypothesis  is  ac¬ 
cepted.  For  each  n,  Cn  is  compared  to  a  threshold 
obtained  from  the  tail  area  of  the  asymptotic  dis¬ 
tribution,  and  Ho  is  rejected  whenever  the  statistic 
exeeds  the  threshold. 


{Q}ki  =  cTH-0.9f-\ 


which  is  reminiscent  of  a  first  order  spatial  AR- 
process  with  a  pole  at  -0.9.  In  Figure  1,  the  total 
MSE  is  displayed  versus  the  SNR.  The  estimates 
are  calculated  using  a  batch  of  AT  =  100  samples. 
Note  that  the  MSE  of  the  MLE  follows  the  the¬ 
oretical  curve  at  high  SNR  values,  and  also  that 
the  LS-based  methods  perform  notably  worse  in 
this  scenario.  The  probability  of  correctly  deter¬ 
mining  the  rank  of  the  regressor  is  displayed  in 
Figure  2.  A  confidence  level  of  0.05  (according  to 
the  asymptotic  distribution  of  the  GLRT  variable) 
is  selected  in  the  detection  procedure.  The  SNR  is 
here  fixed  at  10  dB  and  the  number  of  samples  is 
varied.  About  N  =  300  samples  are  required  for 
determining  the  correct  rank  of  (j)  with  high  proba¬ 
bility  in  this  case.  This  might  seem  a  large  figure, 
but  recall  that  the  number  of  estimated  param¬ 
eters  in  the  unconstrained  ^  is  200.  Notice  also 
that  the  probability  of  detection  appears  to  settle 
at  95%  for  large  N,  as  predicted  by  the  theory. 
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Figure  1 .  Theoretical  ( solid  curve)  and  empir¬ 
ical  total  mean  square  error  versus  signal-to- 
noise  ratio.  MLE  (’x’),  LS  (’-i-’)  and  Modi¬ 
fied  LS  (’o’). 


4  Conclusions 

The  exact  ML  estimator  for  a  linear  regression 
problem,  where  the  regression  matrix  is  known  to 
be  rank-deficient,  is  derived.  An  explicit  expres¬ 
sion  for  the  estimator  is  found,  employing  a  trun¬ 
cated  canonical  correlation  decomposition.  The 
computational  complexity  is  similar  to  that  of  the 
ordinary  least-squares  (LS)  estimator.  However, 
the  proposed  technique  takes  into  account  the  re¬ 
duced  rank  in  an  optimal  way,  which  can  yield 
a  significant  performance  improvement  in  difficult 
situations.  A  GLR  test  is  proposed  for  determin¬ 
ing  the  rank  of  the  regressor.  The  asymptotic 
distribution  of  the  parameter  estimates  are  pre¬ 
sented  in  the  full  version  of  this  paper.  The  com¬ 
puter  simulations  indicate  that  the  derived  asymp¬ 
totic  results  are  useful  in  predicting  the  behavior 
in  samples  of  practical  lengths. 
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Abstract 

Adaptive  beamforming  can  be  used  as  a  method  for  esti¬ 
mating  an  unknown  waveform  from  a  source  impinging  on 
an  array  of  sensors.  When  the  direction-of-arrival  (DOA) 
of  the  incoming  signal  is  known,  the  minimum  variance  dis¬ 
tortionless  response  (MVDR)  beamformer provides  a  distor¬ 
tionless  version  of  the  signal  while  suppressing  noise  and 
interference.  However,  if  there  is  a  mismatch  between  the 
look  direction  of  the  beamformer  and  the  actual  DOA  of 
the  signal,  there  can  be  significant  degradation  in  perfor¬ 
mance.  In  this  paper,  we  use  a  Bayesian  approach  with  the 
MVDR  criterion  to  derive  an  adaptive  beamformer  which 
has  nearly  optimal  performance  under  good  conditions,  and 
is  robust  to  uncertainty  in  DOA  under  poor  conditions. 


1  Introduction 

Adaptive  beamforming  can  be  used  as  a  method  for  esti¬ 
mating  an  unknown  waveform  from  a  source  impinging  on 
an  array  of  sensors.  When  the  direction-of-arrival  (DOA) 
of  the  incoming  signal  is  known,  the  minimum  variance 
distortionless  response  (MVDR)  beamformer  [1]  provides  a 
distortionless  version  of  the  signal  while  suppressing  noise 
and  interference.  However,  if  the  source  DOA  is  not  known 
exactly,  or  if  the  source  or  array  is  moving,  the  mismatch 
between  the  actual  DOA  of  the  signal  and  the  look  direction 
of  the  beamformer  can  cause  a  significant  degradation  in 
performance  [2]. 

Numerous  methods  have  been  proposed  to  overcome  this 
sensitivity  to  pointing  errors.  These  can  generally  be  sep¬ 
arated  into  two  categories,  “robust”  adaptive  beamfotmers, 
and  “responsive”  adaptive  beamformers.  Robust  beam- 
formers  reduce  sensitivity  by  widening  and  flattening  the 
main  beam  around  the  presumed  DOA.  Some  commonly 
used  techniques  are  to  imp)ose  point,  derivative,  or  quadratic 
constraints  on  the  beamformer  output.  Robust  techniques 


generally  work  well  under  a  wide  range  of  scenarios,  but  sac¬ 
rifice  some  performance  with  respect  to  the  optimal  beam- 
former  informed  of  the  true  DOA. 

Responsive  beamformers  attempt  to  “respond”  to  the  cur¬ 
rent  environment  by  learning  or  estimating  the  signal  DOA 
from  the  observations,  then  using  this  information  as  if  it 
were  known  exactly.  Techniques  of  this  type  include  esti¬ 
mating  the  DOA  directly  using  maximum  likelihood  (ML) 
or  some  other  estimation  procedure,  and  learning  the  DOA 
indirectly  by  estimating  the  signal  subspace.  Under  condi¬ 
tions  where  good  DOA  estimates  can  be  obtained,  i.e.  for 
high  signal-to-noise  ratio  (SNR)  and  a  slowly  fluctuating 
DOA,  the  responsive  techniques  have  nearly  the  same  per¬ 
formance  as  the  beamformer  informed  of  the  true  DOA. 
However,  responsive  techniques  can  have  very  poor  perfor¬ 
mance  under  less  favorable  conditions. 

In  this  paper,  we  use  a  B  ayesian  approach  with  the  MVDR 
criterion  to  derive  an  adaptive  beanrformer  which  tends  to 
be  “responsive”  under  good  conditions  and  “robust”  under 
poor  conditions. 

2  Problem  Formulation 

We  consider  the  problem  of  recovering,  or  estimating, 
the  waveform  of  a  narrowband  planewave  signal  incident 
on  an  array  of  M  sensors  from  DOA  6.  The  M  x  1  vector 
of  received  signals  consists  of  a  desired  signal  component 
and  an  additive  noise  component  and  has  the  form 

x{t)  =  a{e)s{t)-hn{t),  (1) 

where  s{t)  is  the  desired  signal,  n(f )  is  the  M  x  1  vector  of 
additive  noise,  and  a(0)  is  the  M  x  1  “array  response”  or 
“steering  vector”  in  the  direction  6. 

Ibe  beamformer  ^plies  a  set  of  complex  weights  w  to 
the  received  signals  and  sums  to  form  the  beamformer  output 

y{t)  =  w"x{t).  (2) 
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When  the  DOA  0  is  known,  the  weights  for  the  MVDR 
beamformer  are  chosen  to  minimize  the  ouQJut  power  of 
the  beamformer,  E  {ly(f)|^},  while  maintaining  a  distor¬ 
tionless  response  in  the  direction  of  the  desired  signal.  The 
weights  are  found  from  the  solution  to 

minw^Rxw  subject  to  a(0)^w  =  1,  (3) 

w 

where  Rx  is  the  data  correlation  matrix 

Rx  =  E{x{t)x{tf}.  (4) 


The  MVDR  weight  vector  has  the  form 

Rx‘a(g) 

""  a(0)HR->a(0)- 


(5) 


are  implemented  by  substituting  an  estimate  of  Rx  such  as 
the  sample  correlation  matrix  obtained  from  N  sn^shots  of 
the  data 

(11) 

^  »=l 

The  number  of  snapshots,  N,  used  in  estimating  R  and  the 
number  of  snapshots,  L,  used  in  estimating  0  need  not  be 
the  same.  Both  are  chosen  to  tune  the  performance  of  the 
processor  for  the  situation  at  hand.  As  a  rule,  both  are  set 
large  enough  so  that  good  estimates  of  the  desired  quantity 
can  be  obtained,  but  small  enough  so  that  the  estimates  can 
follow  temporal  fluctuations. 

3  Beamformer 


When  e  is  not  known  exactly,  sensitivity  to  pointing  er¬ 
rors  can  be  reduced  by  imposing  constraints  on  the  sh^  of 
the  main  beam  to  widen  and  flatten  it.  One  possibility  is  to 
impose  constraints  on  the  beamformer  output  at  K  values 
of  6  near  the  presumed  DOA.  The  weights  are  found  from 

minw^Rxw  subject  to  C^w  =  c,  (6) 

w 

where  C  is  the  M  x  K  matrix  of  steering  vectors  for  the 
constrained  DOAs 

C  =  [a(^i)  •  •  ■  a(0jr)],  (7) 

and  c  is  the  A  X  1  vector  of  constraints.  For  a  distortionless 
response  to  all  the  constrained  DOAs,  c  is  a  vector  of  ones. 
The  constrained  weight  vector  has  the  form 

w  =  Rx‘c(c"Rx'c)"‘c.  (8) 

Additional  constraints  can  improve  robusmess  to  pointing 
errors,  but  hamper  noise  cancellation  because  they  reduce 
adrqrtive  degrees  of  freedom. 

Alternatively,  the  unknown  DOA  0  can  be  estimated 
from  L  gnapghfits  of  the  received  data  vector  taken  at  times 


We  will  use  a  Bayesian  ^proach  with  the  MVDR  cri¬ 
terion  to  derive  an  adaptive  beamformer  which  tends  to 
be  “responsive”  under  good  conditions  and  “robust”  under 
poor  conditions.  It  is  assumed  that  0  is  a  random  parameter 
with  a  priori  probability  density  function  (pdf)  q{6),  which 
reflects  the  level  of  uncertainty  in  the  source  DOA.  The 
Bayesian  approach  has  been  used  for  detecting  signals  un¬ 
der  directional  uncertainty  in  [3],  with  averaging  over  the 
a  priori  pdf  g(0).  The  resulting  detector  was  was  robust, 
but  required  numerical  integration  over  the  a  priori  pdf.  In 
order  to  obtain  a  simpler  and  mote  responsive  beamformer, 
we  will  use  an  technique  similar  to  that  in  [4].  Where  av¬ 
eraging  is  needed,  we  will  use  the  a  posteriori  pdf  p{d\xL) 
given  L  snapshots  of  the  data  vector.  Furthermore,  we  will 
^ggnmp.  that  9(0)  is  defined  only  on  a  disaete  set  of  P  points, 
0  =  {01  ■■■Op),  in  the  a  priori  parameter  space. 

The  objective  is  still  to  minimize  the  output  power,  but 
now  the  constraint  is  for  a  distortionless  response  on  the 
average,  i.e., 

minw^Rxw  subject  to  a"w  =  1,  (12) 

w 

where  a  is  an  average,  or  composite,  steering  vector  aver¬ 
aged  over  p(01xl) 


XL  =  [x(ti)’’  •••  xitiVY ■  (^) 


The  weight  vector  then  has  the  same  form  as  (5)  with  0 
replaced  by  0(xl), 


Rx*a(0(xL)) 

a(0(xL))"Rx^a(0(xL)) 


(10) 


This  technique  works  well  when  the  observed  data  is 
sufficient  to  yield  good  estimates  of  the  DOA,  but  can  result 
in  significant  mismatch  when  the  estimates  are  poor. 

In  practice,  the  data  correlation  matrix  Rx  is  rarely 
known,  and  the  beamformers  weights  in  (5),  (8),  and  (10) 


a  =  5]^  a(0.)p(0i  |xl)  =  Ap,  (13) 

«=i 

where  A  is  the  M  x  P  matrix  of  steering  vectors 

A  =  [a(0i)  •  •  •  a(0p)],  (14) 


and  the  i*'*  element  of  p  is  p(0i  |xl). 

This  results  in  beamformer  weights  of  the  form 


R-^Ap 

p5’A"R-‘Ap 


(15) 
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where  R  has  been  substi tuted  for  Rx.  A  similar  beamfonner 
was  derived  in  in  [5]  under  different  considerations.  In  [5], 
the  vector  p  is  not  related  to  the  a  posteriori  pdf,  but  is 
determined  from  a  complicated  optimization  rule. 

If  we  assume  that  the  source  and  noise  waveforms  are 
sample  functions  of  uncorrelated,  zero-mean,  stationary 
Gaussian  random  processes  with  variance  a\,  and  covari¬ 
ance  (T^I,  respectively,  then  p(xL\9i)  is  a  complex,  zero- 
mean,  Gaussian  density  with  covariance 

Rx(^.)  =  ‘^Ja(^.)a(^0"  +  ‘"^I-  (16) 


Applying  Bayes  rule,  p(6i  |x£)  has  the  form: 


12k=i  9(^i)exp  {;9ia(flit)^Rjra(0t)} 

(17) 

where  Rx  is  the  sample  correlation  matrix  of  x/,  and  /3i&a 
monotonically  increasing  function  of  SNR  (y  =  ^): 


T 

(I  +  Mt)' 


(18) 


The  SNR  is  not  usually  known,  but  P  can  be  viewed  as  a 
variable  which  may  be  adjusted  to  tune  the  responsiveness 
of  beamfonner  to  the  source  SNR,  just  as  N  and  L  can  be 
chosen  to  tune  temporal  responsiveness.  The  beamfonner 
is  updated  in  two  steps.  First  the  a  posteriori  pdf  is  found 
from  (17),  then  the  weights  are  calculated  from  (15). 

Tbe  beamfonner  uses  the  same  amount  of  observed  data 
as  was  used  in  estimating  6  in  (10),  and  similar  a  priori  in¬ 
formation  in  determining  the  aprioripdf  ^(0)  as  was  needed 
in  defining  the  point  constraints  in  (8).  In  this  beamfonner, 
increasing  number  of  DOAs  in  0  does  not  reduce  ad^- 
tive  degrees  of  freedom,  because  they  are  averaged  to  form 
a  composite  steering  vector.  Adding  points  increases  the 
computational  complexity,  and  the  number  of  points  is  cho¬ 
sen  to  cover  the  a  priori  parameter  space  sufficiently  densely 
while  keeping  the  computational  requirements  low. 


4  Performance  Example 

We  now  consider  a  simple  example  to  illustrate  the  per¬ 
formance  of  the  proposed  “a  posteriori”  beamformer  as 
compared  to  the  the  MVDR  beamformer  informed  of  the 
source  DOA,  a  “responsive”  beamformer  which  uses  the 
maximum  likelihood  estimate  (MLE)  of  the  DOA,  and  a 
“robust”  beamformer  which  uses  a  set  of  point  constraints 
over  the  a  priori  interval.  The  array  is  a  uniform  linear 
array  (ULA)  with  half-wavelength  spacing  and  Af  =  8  el¬ 
ements.  The  a  priori  uncertainty  in  the  DOA  is  over  the 
region  u  =  sin(^)  6  [—0.3, 0.3],  For  an  8-element  array, 
this  interval  is  slightly  larger  than  the  width  of  the  main- 
lobe  in  the  ideal  beampattem.  The  set  0  is  composed  of 


P  =  13  evenly  spaced  points  on  the  interval  [-0.3, 0.3]. 
For  the  constrained  beamformer,  we  must  use  less  than  8 
constraints.  Rve  distortionless  constraints  wete  used  at  the 
points  {-0.3,-0.15,0,0.15,0,3}.  The  source  DOA  was 
chosen  to  be  u,  =  0.223,  which  does  not  coincide  exactly 
with  any  of  the  constraint  points  or  any  of  the  points  in  0. 

In  Figures  1-4,  typical  performance  is  illustrated  for  a 
high  SNR  (0  dB)  and  low  SNR  (-20  dB)  case,  respectively. 
Figures  1  and  3  show  the  a  posteriori  pdf  and  typical  beam- 
patterns  for  a  single  trial  in  the  two  cases,  and  Bgures  2  and 
4  show  a  histogram  of  array  gain  for  the  different  beam- 
formers  obtained  from  500  t^s.  In  the  high  SNR  case,  the 
a  posteriori  pdf  is  sharply  peaked  near  the  true  DOA  and 
the  Bayesian  beamformer,  as  well  as  the  informed  and  MLE 
beamformers  have  nearly  the  same  beampattems,  providing 
high  gain  to  the  source,  and  relatively  low  gain  elsewhere. 
The  array  gain  is  relatively  stable  over  all  trials  and  close  to 
the  optimal  value  of  M  =  8  (9  dB).  The  constrained  beam- 
former,  in  attempting  to  provide  good  gain  over  the  entire  a 
priori  interval,  does  not  suppress  noise  as  well  as  the  oth^ 
beamformers,  and  has  a  lower  array  gain,  close  to  0  dB. 

At  low  SNR,  the  a  posteriori  p^  is  nearly  equal  to  the  a 
priori  pdf,  with  some  small  peaks.  The  MLE  of  the  DOA 
attempts  to  find  the  most  likely  estimate,  but  is  not  always 
accurate,  resulting  in  a  beamformer  which  does  not  always 
point  at  the  desired  signal.  The  histogram  of  array  gain 
values  for  the  MLE  beamformer  shows  that  the  MLE  is 
accurate  enough  to  provide  optimal  performance  only  about 
half  of  the  time,  and  can  be  so  inaccurate  as  to  reduce  anay 
gain  as  low  as  -25  dB.  The  constrained  beamformer  still 
provides  good  gain  over  the  entire  a  priori  region  and  a 
stable  array  gain  of  about  0  dB .  Our  Bayesian  beamformer  is 
now  more  robust,  providing  reasonable  gain  over  the  entire 
a  priori  interval,  with  increased  gain  at  local  maxima  in  the 
a  posteriori  pdf.  The  array  gain  is  stable  near  a  value  which 
is  less  than  optimal,  but  still  better  than  the  constrained 
processor. 
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Figure  1.  A  posteriori  pdf  and  beampatterns 
of  adaptive  beamformers  for  SNR  s  0  dB.  A 
priori  intervai  =  [-0.3,0.31,  source  DOA = 0.223. 
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Figure  2.  Histogram  of  array  gain  for  beam- 
formers  from  500  triais  for  SNR  =  0  dB. 
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Figure  3.  A  posteriori  pdf  and  beampatterns 
of  adaptive  beamformers  for  SNR  =  -20  dB.  A 
priori  intervai  =  [-0.3,0.3],  source  DOA = 0.223. 
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Figure  4.  Histogram  of  array  gain  for  beam- 
formers  from  500  triais  for  SNR  a  -20  dB. 
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Abstract 

Maximum-entropy  positive-definite  completion  for 
partially-specified  Toeplitz  covariance  matrices  is  devel¬ 
oped  for  DOA  estimation  in  partially-augmentable  antenna 
arrays  (those  that  have  an  incomplete  set  of  covariance 
lags). 


1.  Introduction 

This  paper  considers  the  problem  of  DOA  (direction-of- 
arrival)  estimation  for  multiple  uncorrelated  plane  waves 
incident  upon  partially-augmentable  antenna  arrays.  This 
type  of  array  has  an  incomplete  set  of  covariance  lags  [6]. 
Specifically,  consider  a  nonuniform  linear  array  (NLA)  ge¬ 
ometry  specified  by  the  sensor  positions  rfj  (i  =  1, . . . ,  M) 
and  set  di  =  0  for  convenience.  Let  the  unit  spacing  d  be 
the  greatest  common  divisor  of  the  difference  set 

V  =  {di-dj  1 1,  j  =  1, . . . ,  M;  i>  j}.  (1) 

Denote  the  maximum  inter-element  distance  (array  aperture) 
by  d(Ma  —  1).  Fully-augmentable  arrays  have  the  property 
that  all  intermediate  integral  distances  are  realised;  ie.  given 
the  sequence  of  natural  numbers  k  =  1, . . . ,  — 1,  we 

have  Kd  G  V.  On  the  other  hand,  partially-augmentable  ar¬ 
rays  have  some  nonzero  number  (G)  of  missed  lags  (“gaps”). 
It  is  clear  that  a  partially-augmentable  array  gives  rise  to  an 
incomplete  augmented  covariance  Toeplitz  matrix  T,  since 
some  lags  are  missing.  Thus  both  the  spatial  covariance 
matrix  estimation  problem  and  the  spatial  spectrum  estima¬ 
tion  problem  must  be  formulated  as  p.d.  (positive  definite) 
Toeplitz  completion  problems. 

The  latter  problem  is  investigated  in  this  paper  for  two 
cases;  firstly  in  the  case  where  the  available  covariance  lags 
are  supposed  to  be  precisely  known  (deterministic  com¬ 
pletion).  Here  we  define  the  unique  maximum-entropy 

*This  study  was  partly  supported  by  the  INTAS  SASPARC  grant. 


p.d.  Toeplitz  completion  and  discuss  its  DOA-estimation 
performance  under  the  condition  that  the  number  of  uncor¬ 
related  plane  waves  (m)  exceeds  the  number  of  antenna 
elements  (M). 

Secondly,  we  shall  investigate  the  case  where  we  as¬ 
sume  we  have  sufficient  statistics  for  the  DOA  estimates 
in  the  form  of  the  direct  data  covariance  (DDC)  matrix  R, 
obtained  by  sample  averaging  on  a  set  of  N  independent 
vectors  (“snapshots”)  originating  from  a  complex  Gaussian 
distribution  CAf{M,  0,  R). 

Our  benchmark  will  be  the  limiting  accuracy  provided 
by  the  Cramer-Rao  bound. 

2.  Deterministic  Matrix  Completion 

Consider  the  covariance  matrix  of  an  M-element 
sparse  array  with  assumed  Gaussian  processes  observed  as 
a  combination  of  m  uncorrelated  plane  waves  with  DOA’s 
0  =  powers  P  -  diag[pi , . . . ,  p„i]  and  white 

noise  of  power  a: 


R  =  BPB^  -f  (tIm  (2) 

where  the  signal  manifold  matrix  B  =  [5(0i), . . . ,  B{6m)], 


Bidi) 


1 ,  exp  (227r—  sin  , . . . ,  exp  (227r-~  sin  6i) 


(3) 

is  the  so-called  steering  vector,  and  A  is  the  wavelength  of 
incident  radiation. 

Let  the  set  of  presented  covariance  lags  be  «S,  where 
=  Rij,  «  =  {di—_dj)/d,  ij  =  1, . .  .,M  and  let  the 
set  of  missing  lags  be  S  (in  the  language  of  matrix  comple¬ 
tion  theory,  these  are  the  specified  and  unspecified  values 
respectively).  Fortunately  the  deterministic  augmented  co- 
variance  matrix  T  =  Toep[f «]  has  at  least  one  p.d.  Toeplitz 
completion  {Tula  which  is  the  covariance  matrix  of  the  cor¬ 
responding  uniform  linear  array)  and  hence  the  feasibility 
of  the  covariance  matrix  estimation  problem  is  guaranteed. 
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Let  the  set  of  all  possible  p.d.  Toeplitz  completions  T  be 
introduced  as  follows: 

T=[z-  T{z)  =  T  +  ^  {z,E%  +  z^Et)  >  o}  (4) 

K^S 


where 


'0  1 

0  1 

'0 

1  0 

0  ■ 

II 

1 

1  0 

0 

t _ _ 

1  0 

(5) 


It  can  be  shown  [2]  that  the  function 


<j>{z) 


logdetT  ^(z)  z  E  T 
oo  otherwise 


(6) 


is  strictly  convex  on  the  feasible  set,  and  so  has  a  unique 
minimiser  which  we  denote  by 


zIje  '■=  argimn<;!>(z)  =  argn:mdetT(z).  (7) 

We  refer  to  zlj^  as  the  analytic  centre  of  the  linear  matrix 
inequality  T'(z)  >  0.  For  Gaussian  distributions,  we  may 
evidently  treat  the  analytic  centre  as  a  maximum-entropy 
completion. 

A  recently-developed  convex  programming  approach  [2] 
can  be  directly  applied  to  find  .  The  Ellipsoid  Algorithm 

is  first  applied  to  find  an  arbitrary  feasible  point  Zq  such  that 
r(z2 )  >  0.  If  such  a  feasible  point  is  found,  any  convergent 
minimiser  of  <j){z)  should  find  the  analytic  centre.  Nesterov 
and  Nemirovskii’s  [5]  version  of  Newton’s  method  has  been 
implemented  here  to  find  the  optimum  solution. 

Thus  the  unique  maximum-entropy  p  .d.  Toeplitz  comple¬ 
tion  may  now  be  found  for  the  partially  specified  covariance 
matrix  T  induced  by  the  partially-augmentable  array.  Since 
the  ME  completion  T{zIie)  does  not  coincide  with  the  true 
Mo, -variate  covariance  matrix  R,  a  further  step  is  proposed 
to  truncate  the  signal  subspace  dimension. 

The  ME-optimum  completion  may  be  treated  as  an  initial 
estimate  to  the  solution  TmME  of  the  following  optimisation 
problem: 

Find  inf  ||T'(z^^)  —  Tm  11  p  subject  to  Tm&C  :=  C\  D  C2 

(8) 

where 

Cl  =  {Tm  e  :  Tm  is  Toeplitz  and  p.  d.} 

C2  =  {Tm  :  (Tm-^minl)  is  of  rank  m} , 

and  where  W  is  the  space  of  p  x  p  Hermitian  matrices, 
and  Xmin  is  the  minimum  eigenvalue  of  Tm ,  of  multiplicity 
(Ma-m).  Convergent  alternating  projection  methods  de¬ 
scribed  in  [3]  are  used  to  find  the  (at  least  local)  extremum 
for  this  problem. 


3.  Stochastic  Matrix  Completion 


When  the  direct  augmentation  approach  [1]  is  used  to 
obtain  the  specified  lags  in  via  the  stochastic  DDC  matrix 
R,  the  feasibility  condition  is  no  longer  guaranteed  to  hold. 
In  our  terms,  the  feasibility  condition  deals  with  conditions 
under  which  a  p.d.  Toeplitz  matrix  completion  exists  for  the 
given  i^,  k£S. 

Unfortunately,  the  necessary  and  sufficient  feasibility 
conditions  for  the  general  p.d.  Toeplitz  completion  problem 
have  not  yet  been  found  [4].  One  of  the  obvious  neces¬ 
sary  conditions  is  that  every  specified  principal  sub-matrix 
should  be  positive  definite.  Denoting  the  greatest  specified 
sub-matrix  of  f  by  Tn^,  this  necessary  but  obviously  not 
sufficient  condition  is 


Tn,^  >  0.  (10) 


Thus  the  success  of  the  Ellipsoid  Algorithm  can  be  treated 
as  our  only  feasibility  condition.  When  the  condition  at 
Eqn.  (10)  is  not  satisfied,  or  the  Ellipsoid  Algorithm  fails 
to  find  a  feasible  point,  we  need  to  modify  the  initial  set  of 
specified  and  unspecified  sample  covariance  lags  in  order 
to  achieve  feasibility  with  the  minimum  possible  deflection 
from  the  initial  set  of  (maximum  likelihood)  estimates  E 
S.  Let 


f(z)  =  T;+^(z«i;(^+z:i;!!)  +  ^(z«,E(^-i-z:£;*) 
k£S  Kes 

(11) 

then  the  minimum  deflective  feasible  point  search  is: 


Find  min^lzsl^  subject  to  T(z)  >  0.  (12) 

k€S 

Procedures  elaborated  in  [2]  provide  a  straight-forward  ap¬ 
proach  to  finding  the  unique  optimum  solution  for  this  prob¬ 
lem. 

One  such  procedure  uses  the  Ellipsoid  Algorithm  to  find 
the  optimal  feasible  point  t(zo'’* ) .  Another  approach  adopts 
the  penalty  function  method,  which  admits  simultaneous 
completion  and  deflection:  / 

Find  min$(z)  =  log  de^T'“^(zK) -1- ^  ^  |z„|  (13) 

''£‘5  k€S 


subject  to  T(z)  >  0,  (14) 

Ma-l 

With  r(z)  =  fo+Y^  (z^El+zlEl).  (15) 

Evidently,  as  /i  00  we  expect  the  optimum  solution  of  this 
problem  to  coincide  with  the  previous  approach.  Since  both 
the  entropy  and  the  deflection  norms  relate  to  the  DOA  es¬ 
timation  accuracy,  an  appropriate  trade-off  between  the  two 
which  optimises  the  DOA  estimation  accuracy  is  desired. 
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4.  Simulation  Results 

To  demonstrate  the  efficiency  of  the  proposed  ME  com¬ 
pletion  approach,  we  consider  a  sparse  array  with  M  =  5 
elements.  The  sensors  are  at  positions  rf  =  {0, 1,4, 9, 11}, 
measured  by  half-wavelengths.  The  single  missed  covari¬ 
ance  lag  is  /6*  We  start  with  the  deterministic  matrix  com¬ 
pletion  problem,  where  the  exact  values  of  the  specified 
covariance  lags  were  calculated.  We  have  m  =  8  sources 
having  a  common  SNR  ratio  of  20  dB  with  respect  to  white 
noise,  ie.  pi  =  100;  cr  =  1  in  Eqn.  (2).  The  sources  are  uni¬ 
formly  separated  in  spatial  frequency  {w  —  irsinO)  and  the 
inter-source  separation  is  Aw  =  0.08  for  the  first  scenario 
(Fig.  1)  and  Aw  =  0.12  for  the  second  scenario  (Fig.  2). 

The  dotted  line  in  Fig.  1  demonstrates  the  behaviour  of 
the  ME  spectrum  obtained  when  the  12- variate  spatial  co- 
variance  matrix  Tme  is  restored  via  the  ME-completion 
algorithm.  For  comparison,  the  dashed  line  shows  the  ME 
spectrum  obtained  for  the  corresponding  12-element  uni¬ 
form  linear  array  exact  covariance  matrix  Tula  •  Also  illus¬ 
trated  are  the  DOA’s  estimated  by  root-MUSIC  applied  to 
the  ME-completed  covariance  matrix  Tme  •  Fig.  2  compares 
the  behaviour  of  the  ME  spectra  and  root-MUSIC  DOA  esti¬ 
mates  for  the  ME-completion  algorithm  with  the  associated 
ULA  spectra  for  the  case  Aw  =  0. 12. 

These  results  demonstrate  that  the  ME  spectra  obtained 
via  the  ME-completion  algorithm  for  sparse  arrays  practi¬ 
cally  coincide  with  the  corresponding  ULA  ME  spectrum. 
In  this  sense  the  “maximum  entropy”  properties  of  the  ULA 
are  fully  restored  by  the  ME-completion  approach  applied 
to  the  sparse  array. 

Meanwhile,  for  both  scenario,  the  ME  spectral  maxima 
do  not  correspond  to  the  true  UOA’s  and  even  the  number 
of  main  peaks  is  erroneous.  Nevertheless  for  the  greater 
spatial  separation,  the  root-MUSIC  DOA  estimates  calcu¬ 
lated  for  Tme  locate  the  true  DOA’s  with  negligible  errors. 
For  the  smaller  separation,  the  root-MUSIC  DOA  estimates 
are  essentially  erroneous.  This  once  again  demonstrates 
that  the  maximum  entropy  criterion  is  inconsistent  with  the 
harmonic  analysis  criterion,  especially  for  severe  “super¬ 
resolution”  conditions. 

To  verify  this,  Fig.  3  illustrates  the  behaviour  of  the 
Cramer-Rao  bound  for  DOA  estimation  accuracy  as  a  fiinc- 
tion  of  spatial  frequency  separation  Aw  for  the  same  8- 
source  scenario  with  N  =  1000  snapshots.  The  maximum 
RMSE  from  the  eight  sources  is  depicted  by  the  dotted  line. 
Note  that  a  separation  of  (Aiy  =  0.08)  is  far  beyond  the 
realistic  resolution  capabilities  of  the  antenna  examined. 
However  in  the  area  where  the  accuracy  limit  is  reasonable 
(Au;  >  0.15),  ME-completion  applied  to  the  deterministic 
matrix  provides  practically  unbiased  DOA  estimates  via  the 
MUSIC  /  root-MUSIC  approach.  Thus  our  approach  for 
situations  in  this  region  provides  asymptotically-unbiased 


estimations  and  we  may  now  analyse  the  mean  and  RMSE 
of  the  stochastic  errors  for  the  finite  sample  size  N, 

The  solid  and  dashed  lines  in  Fig.  3  illustrate  the  max¬ 
imum  sample  DOA  RMSE  and  bias  respectively  for  the 
eight  sources  as  a  function  of  the  inter-source  spatial  sep¬ 
aration.  In  each  of  the  KXX)  trials,  the  MUSIC  algorithm 
was  applied  to  the  p.d.  finite  signal  subspace  Toeplitz  matrix 
TmME,  which  is  obtained  via  the  ME  completion  algorithm 
with  an  initially  modified  data  set  (Eqns.  (11)  and  (12)). 

These  results  clearly  define  the  pre-asymptotic  domain 
in  this  case  as  Aw  <  0.16,  where  the  false  peaks  of  the 
MUSIC  sample  pseudo  spectra  often  give  rise  to  completely 
erroneous  DOA  estimates.  Following  [1],  we  define  “ab¬ 
normal”  DOA  estimates  as  those  estimates  w,-  lying  outside 
the  range  [a;,-  ^].  For  this  simulation,  a 

total  of  627  abnormal  trials  were  rejected  (in  the  process 
of  ensuring  that  KXX)  normal  trials  were  finally  obtained) 
for  Aw  =  0.15;  while  there  were  29  abnormal  trials  for 
Aw  =  0.16.  In  the  asymptotic  domain  Aw  >  0.16  (where 
the  number  of  abnormal  estimates  vanishes),  DOA  estima¬ 
tion  accuracy  is  reasonably  close  to  the  Cramer-Rao  bound, 
similarly  to  fiilly-augmentable  antenna  arrays  [1],  although 
the  bias  here  remains  nonzero. 

It  is  interesting  to  note  that  in  this  simulation,  the  nec¬ 
essary  condition  Tn^  >  0  was  able  to  detect  between  0% 
and  60%  of  all  the  initial  unfeasible  sets  of  covariance  lag 
estimates,  depending  on  source  separation. 

In  some  simulations,  minimum-deflection  completion 
(Eqns.  (1 1)  -  (12))  is  significantly  improved  by  the  penalty 
method  (Eqns.  (13)  -  (15)).  For  example,  for  m  =  6  sources 
separated  by  Aw  =  0.19  and  a  penalty  value  p.  =  10“^,  the 
maximum  RMSE  and  maximum  bias  are  reduced  from  0.01 1 
to  0.(X)5  and  0.034  to  0.021  respectively. 

5.  Summary 

The  above  convex  programming  technique  provides 
a  unique  solution  to  the  problem  of  maximum-entropy 
p.d.  Toeplitz  completion  (spectral  estimation)  for  the  in¬ 
complete  Toeplitz  augmented  covariance  matrix  that  meets 
the  feasibility  condition.  When  the  Ellipsoid  Algorithm  fails 
to  find  an  arbitrary  feasible  point  for  stochastic  covariance 
lag  estimates,  the  modification  approach  is  proposed. 

It  has  been  shown  that  the  deterministic  ME  spectra  ob¬ 
tained  by  this  technique  are  practically  identical  to  the  ULA 
ME  spectra  for  all  situations  examined.  However,  this  sim¬ 
ilarity  does  not  necessarily  imply  that  the  corresponding 
root-MUSIC  DOA  estimates  are  true. 

For  situations  when  the  number  of  abnormal  estimates 
vanishes,  the  actual  DOA  estimation  accuracy  obtained  by 
this  new  approach  is  demonstrated  to  be  reasonably  close  to 
the  Cramer-Rao  limit. 
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Exact  Lags,  d=[0,l,4,9,ll],  (G=l,  R=0,  Nmax=5),  ni=8,  sep=0.08,  SNR=20<1B 
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DOA  (w) 


Exact  Lags,  d=(0,l,4,9,ll],  (G»I,  R»0,  Nmax=5),  id=8,  sep=0.12,  SNR=20dB 
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Figure  1 .  Deterministic  completion,  Aw = 0.08.  Figure  2.  Deterministic  completion,  Aw = 0.1 2 
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Abstract 

A  tree  structured  Expectation  Maximization  (EM) 
algorithm,  is  proposed  and  applied  to  the  wide-band  an¬ 
gle  of  arrival  estimation.  It  may  be  seen  as  a  gen¬ 
eralization  on  EM  using  the  ideas  of  Cascade  EM  al¬ 
gorithm  and  Space  Alternating  Generalized  EM  algo¬ 
rithm.  Also^  for  passive  data  acquisition,  robust  and 
efficient  alternatives  for  the  estimation  of  the  source 
signals  are  investigated. 


1.  Introduction 

In  many  data  acquisition  systems,  reception  data 
acquired  by  an  array  of  sensors  are  processed  to  ob¬ 
tain  information  about  the  source  locations.  When 
the  sources  are  located  relatively  far  away  from  the 
sensors,  only  the  direction  of  arrivals  of  the  acquired 
source  signals  can  be  reliably  obtained.  Although  the 
Maximum  likelihood  (ML)  estimation  provides  more 
accurate  estimates  for  the  direction  of  arrivals,  due  to 
the  higher  computational  cost  of  obtaining  the  ML  esti¬ 
mates,  it  has  not  found  much  use  in  practice.  However, 
by  exploiting  the  superposition  property  of  the  data 
acquisition  system,  the  complexity  of  the  ML  estima¬ 
tion  can  be  greatly  reduced  by  using  the  Expectation 
Maximization  (EM)  algorithm  [1,  2,  5],  In  EM  for¬ 
malism,  the  observation,  incomplete  data  is  obtained 
via  a  many- to- one  mapping  from  the  complete  data 
space  that  includes  signals  which  we  would  obtain  as 
the  sensor  outputs  if  we  were  able  to  observe  the  effect 
of  each  source  separately.  The  EM  algorithm  iterates 
between  estimating  the  log-likelihood  of  the  complete 
data  using  the  incomplete  data  and  the  current  param¬ 
eter  estimates  (E-step)  and  maximizing  the  estimated 
log-likelihood  function  to  obtain  the  updated  parame¬ 
ter  estimates  (M-step).  Under  mild  regularity  condi¬ 


tions,  the  iterations  of  the  EM  algorithm  converges  to 
a  stationary  point  of  the  observed  log-likelihood  func¬ 
tion,  where  at  each  iteration  the  likelihood  of  the  es¬ 
timated  parameters  is  increased  [11],  In  this  study,  a 
tree  structured  hierarchy  is  used  for  the  description  of 
relation  between  the  complete  data  space  and  the  ob¬ 
servations.  Within  this  hierarchy  it  is  possible  to  com¬ 
bine  in  one  algorithm  the  ideas  of  the  Cascade  EM  and 
Space  Alternating  Generalized  EM  algorithms  [3,  7], 
For  the  estimation  of  unknown  signals  arriving  from 
different  directions  to  a  passive  array,  alternative  regu¬ 
larized  estimation  schemes  to  the  common  least  squares 
solution  are  investigated.  For  this  purpose  two  differ¬ 
ent  methods  are  used.  The  first  one  is  an  adaptive 
Tikhonov  type  regularized  least-squares  (RLS)  estima¬ 
tion  method,  which  is  computationally  intensive  and 
the  second  one  is  an  averaged  least-squares  estimation 
(LSSET)  method  over  a  set  of  angles  in  a  neighbor¬ 
hood  of  the  nominal  angles.  It  has  been  demonstrated 
that  when  RLS  or  LSSET  methods  are  used  in  the  es¬ 
timation  of  the  received  signals,  the  EM  algorithm  has 
better  convergence  behavior. 

2.  Signal  Model 

For  the  case  of  M  sources  with  direction  of  arrivals 
^  ^  I  ^  M ,  the  measured  signal  at  the  i’th  sensor 
of  an  array  with  P  sensors  is 

M 

-  niOl))  +  Ui{t) 

1=1 

l<i<P  ,  t=0,T,2T,...,(N-l)T  (1) 

where  si{t)  is  the  wide-band  signal  of  the  Vih  source, 
Ui{t)  is  the  0  mean  spatially  and  temporally  white 
Gaussian  noise  at  the  z'th  sensor,  ri{0)  is  the  time  delay 
of  the  source  signal  from  the  direction  ^  as  it  propa¬ 
gates  to  the  z’th  sensor  relative  to  the  phase  center  of 
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the  array,  a,:(t,  6)  is  the  time  domain  function  for  the 
gain  of  the  i’th  sensor  which  is  dependent  on  frequency 
and  the  direction  of  arrival,  6.  The  frequency  domain 
representation  of  (1)  is, 

Yi(k)  =  ^Aiik,0i)e-^^-^S,(k)  +  Ui{k) 

i=i 

l<i<P,  0<k<F  (2) 

where  F  is  the  DFT  size  which  is  chosen  sufficiently 
large  and  Ydk),  Ai{k,0),  Siik)  ve  Ui{k)  are  the  trans¬ 
forms  of  aiit,d),  si{t)  and  Ui{t)  respectively.  Let 

the  following  definitions  be  made  ('  is  the  transpose 
operator) 

h(k,e)  =  [Atik,e)e-^^^^  ■■■ 

Ap{k,e)e-i^^r 
=  [bi{k,9)---bp{k,e)]' 

B(^%0)  =  [b(fe,  ^i)  •  •  •b(fc,  0Af)] 

S(k)  =  [5'i(^’)  •  •  •  Smik)]' 

Yik)  =  mk)---Yp(k)y 

Using  these  definitions  (2)  becomes 

Y(fc)  =  B(fe,0)S(fc)  +  U(fc)  (3) 

This  final  compact  form  of  the  measurement  relation, 
which  is  the  same  as  the  signal  model  of  the  Cramer- 
Rao  Lower  Bound  formula  in  [8],  will  be  used  in  our 
derivations, 

3.  Wide-Band  EM  Algorithm 


difficulty  by  an  iterative  search  in  much  lower  dimen¬ 
sional  parameter  spaces  [1],  The  EM  method  requires 
the  identification  of  so  called  complete  data  space.  In 
our  application  the  commonly  used  complete  data  is 
Xi{k)  =  which  is  the  signal  that 

would  be  observed  at  the  sensors  if  we  were  able  to  see 
the  effect  of  Pth  source  only.  Then  the  many-to-one 
mapping  for  all  sources  from  the  complete  data  space 
to  the  incomplete  data  space  can  be  written  as 

M 

Y{k)  =  Yl'Ki{k)  0<k<F  (5) 

The  mean  of  the  complete  data  X/(k)  is  h{kyOi)Si{k) 
and  it  is  normally  distributed.  The  log-likelihood  func¬ 
tion  of  the  complete  data  is 

F-l  M 

£e(©,  S;  X)  =  -  V  Y"  ||X,(^;)  -  h(k,  (6) 

Ai=0  l=l 

Here,  the  observed  signal  is  decomposed  to  M  con¬ 
stituents.  Therefore  to  estimate  0i  and  Si{k),  only 
Xi{k)  is  used  besides  the  observation.  At  the  n’th  it¬ 
eration  of  the  EM  algorithm  expectation  step  condi¬ 
tionally  estimates  the  likelihood  of  the  complete  data 
£c(0,S  I  0”,S”).  Maximization  step  then  finds  the 
maximizer  of  the  estimated  likelihood  and  assigns  to 
To  find  h{k,ei)Si{k)  it  is  sufficient  to  know 
X/(ifc),  therefore  in  expectation  step  X/(^)  is  estimated. 
It  can  be  shown  that,  ([6],  p.  164), 

xnk)^s{Xi{k)\ef,srikiY{k)} 

=  h{k,  ey)sy{k)  +  -  B(fc,  0”)s"(ao)] 

0<k<F  (7) 


Since,  the  measurement  noise  is  modeled  as  nor¬ 
mally  distributed  additive  noise,  the  probability  den¬ 
sity  of  the  observations  are  Gaussian.  Hence,  the  log- 
likelihood  function  of  the  observations  has  the  following 
familiar  form  is  the  conjugate  transpose  operator), 

F-l 

£(0,  S;  Y)  =  -  [Y{k)  -  B(fc,  0)S(fc)]^ 

A;=0 

[Y{k)-B{k,&)S{k)]  (4) 


In  maximization  step  complete  data  likelihood  which 
is  formed  by  using  X”  (k)  is  maximized  with  respect  to 
9i  and  Si{k).  The  6i  update  is  found  as 


=  arfifmax 
6 


{max 
5, 


F-l  M 

^Ty^\\xrik)-h{k,9)Siik)f 

k=0  1=1 


In  order  to  find  the  ML  estimate,  likelihood  function 
of  the  observations  should  be  maximized  with  respect 
to  ©  and  S(^’).  However,  the  direct  maximization  of 
this  function  is  not  only  computationally  demanding 
but  also  due  to  the  local  maxima  structure  of  the  like¬ 
lihood  function  it  is  not  guaranteed  to  converge  to  the 
global  maxima.  The  Expectation  Maximization  (EM) 
method  of  obtaining  the  ML  estimate  overcomes  this 


where  there  is  two  maximization  problems  inside  one 
another.  If  Si(k)  is  unknown  they  must  be  simultane¬ 
ously  solved.  For  a  given  0  value,  the  solution  of  the 
inner  maximization  is 

Siik)  =  [b(k,0)bHk,0)]-^bHk,0)X,(k) 

bHk,0)X,(k)  .g. 

-  ||b(fe,  0)112 


Inserting  this  expression  into  (8)  and  solving  for  the 
outer  niaximizatioii  6*,”+^  is  found.  For  that  maximiza¬ 
tion  linear  search  may  be  used.  Finally,  at  the  n’th 
iteration  of  the  EM  algorithm  the  update  formulas  are 
as  follows, 


^n  +  l 


arg  max  ^ 
k  =  Q 

hHk,0)xr{k)x,^'‘(k)h(k,e) 

bt(fc,^»+i)xrw 

||b(fc,^"  +  l)||2 


[f  Si(k)  is  known,  as  in  active  array  applications,  (8)  is 
simply  reduced  to  one  maximization  problem  and  there 
remains  no  need  for  (11).  If  (10)  and  (11)  are  run 
together,  i.e.,  in  the  case  of  unknown  source  signals, 
should  be  close  to  true  direction  values  for 
to  converge  to  true  signal  waveforms. 

After  (10),  0"+^  is  available.  If  it  is  inserted  into 
(3),  can  be  solved  for  by  using  a  number  of  al¬ 

ternatives.  For  instance  the  least  squares  (LS)  solution 
is  as  follows, 


S{k)  =  a?-(;min||Y(^)  -  B(/;,0)S(*)||^ 

S(k) 

=  [B^k,  0)B(Ar,  e)]-^B^k,  &)Y{k)  (12) 

Regularization  may  be  applied  on  the  LS  solution 
which  is  called  regularized  least  squares,  RLS, 

S{k)  =  [B^k, &)B(k,  0)  +  AiI]-'Bt()t,  0)Y(/fe)  (13) 

It  is  important  to  chose  //  in  the  regularization  and  it 
can  be  chosen  optimally  [4,  9].  Another  alternative  in 
source  signal  estimation  may  be  the  following  which 
will  be  referred  to  as  LSSET  solution,  where  /C  is  a  set 
of  angles  in  a  neighborhood  of  0, 

S{k)  ~  arg  min  f  ||Y(fe)  -  B(^,  0)S(Ar)||^c/0  (14) 
SikjjK: 


EM  algorithm  starts  with  n  =  0  at  which  time  0°  is 
available  obtained  by  using  a  rough  estimation.  To  find 
X^{k)  in  (7),  S'f  is  needed  and  it  is  estimated  by  one 
of  the  methods  mentioned  above.  EM  shows  mono¬ 
tonic  increase  of  the  likelihood  and  its  convergence  is¬ 
sues  have  been  investigated  [1,  11]. 


4.  Tree-Structured  EM 


structured  as  a  binary  tree  as  shown  in  Figure  1. 
Yj  j(fc)  is  the  intermediate  incomplete  data  between 
the  observation  Y(^)  and  the  complete  data  X/(Ar)’s. 
In  this  setting  EM  algorithm  is  run  for  two  sources  at 
a  time  using  the  intermediate  data  at  the  joint  node 
of  two  leaves.  This  provides  an  update  for  the  corre¬ 
sponding  DOA  and  source  signals.  The  value  of  the 
intermediate  data  is  found  by  using,  in  (3),  the  origi¬ 
nal  observation  Y(fc)  and  the  current  source  signal  es¬ 
timates  other  than  the  ones  which  are  to  be  updated 
by  the  current  run.  For  instance,  to  run  EM  algorithm 
for  Xx(^)  and  X.2{k)  we  form  the  required  incomplete 
data  as 


Y2,i(A*)  =  Yi,i(fc)-B(Ar, 


O3 


.  f  Ssik)  ■ 
’  L  S4(^)  . 


(15) 


where  Yi,i(^)  is  found  by  using  Y(k)  and  the  cur¬ 
rent  estimates  for  the  last  there  source  signals  in  (3). 
Y2,2(^)  may  be  found  similarly  and  EM  algorithm  is 
run  for  that  branch  too.  This  may  be  repeated  a  num¬ 
ber  of  times  and  then  by  using  the  updates  obtained 
for  the  first  4  source  signals  and  DOA’s,  branch  of 
Yi  2(k)  may  be  processed.  The  idea  of  putting  inter- 


I  I — index 
level 


X4 


X3 

X, 

X, 


Figure  1.  An  example  for  the  tree  structure. 

mediate  data  mappings  between  Y{k)  and  X/(ik)’s  can 
be  associated  with  that  of  the  Cascade  EM,  CEM,  al¬ 
gorithm  but  here  there  is  more  than  one  intermediate 
data  space.  Due  to  the  limited  space,  the  generalization 
of  CEM  to  multiple  levels  is  not  presented  here.  The 
tree  structure  may  also  be  associated  with  Space  Al¬ 
ternating  Generalized  EM  algorithm  in  the  sense  that 
not  all  of  the  parameters  are  updated  at  a  time.  Also 
EM  is  run  on  a  more  noisy  data  reducing  the  informa¬ 
tion  content  of  intermediate  observations  and  this  is 
reported  to  speed  the  convergence  [3]. 

5,  Simulations  and  Conclusions 


In  this  section  we  will  use  a  different  mapping  from 
the  complete  data  to  the  incomplete  data  which  is 


Observation  signals  are  obtained  by  simulation  of  a 
linear  array  of  sensors.  The  number  of  signals  are  as¬ 
sumed  to  be  known  since  there  are  studies  in  detection 
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[10].  The  source  signals  are  taken  as  coherent  pulse 
modulated  chirp  signals  with  bandwidth  comparable 
to  the  center  frequency.  Noise  is  assumed  to  be  in¬ 
dependent  identical  Gaussian  distributed.  First  the 


DOA  aslimation  error 


Figure  2.  Averaged  traces  of  DOA  error  norms. 


Figure  3.  DOA  error  variances. 

signal  estimation  alternatives  are  inserted  in  the  EM  al¬ 
gorithm  and  their  relative  performances  are  compared. 
EM  algorithm  is  run  for  two  sources  impinging  from  35® 
and  -50®  at  an  SNR  level  of  OdB.  The  averaged  traces 
of  error  norm  of  DOA  estimation,  which  describes  the 
convergence  behaviours,  can  be  seen  in  Figure  2  where 
EM,  LS,  RLS,  LSSET  refers  to  (11),  (12),  (13),  (14)  re¬ 
spectively.  The  DOA  error  variance  together  with  the 
CRLB  for  each  alternative  is  plotted  in  Figure  3.  For 
LSSET,  a:  consists  of  5  angles  in  a  1®  neighborhood  of 
the  current  DOA.  This  figures  out  to  be  computation¬ 
ally  less  complex  than  RLS. 

To  compare  the  tree-structured  EM  algorithm  with 
the  original  EM  algorithm  four  sources  from  directions 
©  =  [35''  -  50®  -  20®  50®]'  are  used  at  SNR=10dB. 
Initial  DOA’s  are  given  as  0o  =  [33®  -48®  -18®  48®]'. 


The  DOA  error  norms  for  iterations  of  original  EM  and 
tree-structured  EM  algorithms  are  shown  in  the  next 
table.  The  original  EM  algorithm,  could  not  converge 
to  true  DOA  values.  Furthermore,  it  diverges  from  the 
initial  angle  values.  But  within  the  same  number  of 
total  iterations  the  tree-structured  EM  converges  with 
much  lower  DOA  error  to  0  =  [35.3  —50.0  —20.0  50.7]'. 


iteration  no.  — > 

10 

20 

50 

100 

EM  (10-'') 

5.3 

5.5 

5.5 

5.5 

Tree-EM  (lO"'*) 

6.3 

6.1 

4.6 

2.2 

By  this  study,  an  improvement  on  EM  algorithm  is 
realized  not  only  by  using  robust  signal  estimation 
schemes  but  also  by  changing  the  data  mapping  of  the 
original  algorithm. 
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Abstract 

Traditionally,  high  resolution  spectral  Direction  Of 
Arrival  (DOA)  estimation  has  been  associated  with 
algorithms  rather  than  with  a  processing  scheme  or 
architecture.  Motivated  by  a  previous  work  on  feasible 
implementations  of  the  Estimate  and  Maximize  algorithm 
11],  the  authors  show  that  classical  bank  filter  approach 
[see  2  and  its  references]  can  get  similar,  even  better, 
performance  than  the  most  sophisticated  algorithms,  in 
terms  of  performance  versus  complexity.  In  fact,  the 
practicality  and  robustness  required  for  DOA  trackers,  both 
in  radar  and  in  the  mobile  communication  scenarios  to 
alleviate  data  fusion  and  hand-over  respectively,  makes 
evident  the  use  of  filter-bank  or  scanning  beams  for  DOA 
tracking  at  the  expense  of  resolution.  The  herein  reported 
tracker  enhances  complexity  and  robustness  of  these 
schemes,  achieving  high  resolution  from  the  EM 
architecture.  The  result  is  a  low  complexity  tracker  with 
robustness  against  coherent  sources  and  a  resolution  close 
to  Singular  Value  Decomposition  (SVD)  based  methods. 

1.  Introduction 

Motivated  by  a  previous  work  on  feasible 
implementations  of  the  Estimate  and  Maximize  algorithm 
[1],  the  authors  show  that  classical  bank  filter  approach  [2] 
can  get  similar,  even  better,  performance  than  the  most 
sophisticated  trackers  in  terms  of  performance  versus 
complexity.  The  present  summary  is  organized  as  follows: 
Section  2  goes  over  the  scanning  beam  procedures  for 
DOA  estimation  and  brings  in  the  modifications  of  interest 
in  this  work.  Next,  Section  3  brings  out  the  EM-based 
architecture  in  order  for  Section  4  to  propose  a  multiple 
source  tracker  that  uses  this  architecture  together  with  the 
beamforming  scanning  approach  briefly  described  in  the 
previous  section.  The  result  is  a  DOA  tracker  architecture 
and  algorithm  whose  robustness  and  performance  is 
associated  to  the  intrinsic  clarity  and  simplicity  of  the 
processing  scheme  and  the  DOA  algorithms  used  inside. 

♦  Ttiis  work  has  been  supported  by  PR^Nxit/ClCYT:  riO-^5-1022- 
C05-01  and  CIRIT/GENERALITAT  de  Cat.  GRQ  93-3021 


II.  Scanning  beam  procedures  for  Doa 
estimation 

In  face  of  DOA  detectors,  usually  based  on  SVD  of  the 
data  matrix  or  its  covariance,  the  oldest  approach,  referred 
to  as  the  bank  filter  approach,  uses  a  dedicated  beam  to 
explore  all  the  scenario  looking  for  the  steering  directions 
where  a  local  maximum  of  received  power  is  produced.  As 
it  can  be  viewed  in  Figure  1 ,  the  DOA  estimator  is 
implemented  by  a  steerable  beam  a  (s(i(0)  denotes  the 
steering  vector,  focused  on  angle  0,  to  which  the  beam  is 
steered)  which,  followed  by  a  power  device  (envelope 
detector  plus  integration)  produces  the  power  density  <I> 
(power/solid  angle)  for  every  search  direction  (the 
spatial  bandwidth  Bn  is  the  noise  bandwidth  [2]).  Finally, 
the  DOA  estimate  will  be  the  maximum  of  the  spatial 
power  density. 


Figure  1.  Scanning  beam  scheme  for  DOA 
estimation 


From  the  simplicity  of  the  scheme  depicted  in  figure  1 , 
it  can  be  concluded  that  complexity  and  robustness  of  these 
procedures  are  their  main  features.  It  is  in  terms  of 
resolution  when  the  main  criticism  appears.  Any  DOA 
estimation  procedure  using  a  beamvector  to  measure  power 
density  has  to  face  the  uncertainty  principle  being  the 
product  of  the  aperture  size  in  wavelenghts  by  the 
beamvector  bandwidth  bounded.  An  example  is  the  classic 
phased-array  scanning  procedure. 

The  phased-array  scanning  (1)  can  be  formulated  as  a 
beamformer  a  with  0  dB  gain  in  the  steered  direction  S(j 
and  minimizing  the  response  to  the  non-directional  noise 
(with  identity  covariance  matrix) 
a^S(i  =  l  (La)  a^S(i  =  l  (2.a) 

a\min  (Lb)  R  alfuin  (2.b) 


0-8186-7576-4/96  $5.00  ©  1996  IEEE 
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The  phased-array  response  is  distorted  whenever  non- 
uniform  spatial  noise  or  source  distributions  are  to  deal 
with.  To  alleviate  in  part  the  resulting  leakage  problem  the 
so-called  Capon's  beamformer  is  designed  in  a  data 
dependent  fashion.  The  Capon's  beamformer  adapts  to  the 
spectral  content  of  the  input  process  at  each  DOA  of 
interest.  For  each  scan  direction,  it  reduces  the  interference 
contributions  to  noise  level.  Its  basic  formulation  is 
shown  in  (2),  where  R  is  the  data  correlation  matrix 
measured  from  the  snapshot  vector  x. 

Nevertheless,  the  case  of  interest  of  the  hereafter  tracker 
is  that  both  approaches,  i.e.  the  data  independent  or  phased 
array  and  the  data-dependent  or  Capon's  beamvector,  are 
suitable  for  introducing  additional  constrains.  Specifically, 
the  modification  we  are  interested  in  is  when  a  given 
direction  sq  has  to  be  nulled  out  in  order  to  reduce  leakage 
in  any  S(j  due  to  the  potential  presence  of  an  interference 
source  at  sq.  The  resulting  beamvector  comes  from  the 
following  formulation  in  (3)  and  (4) 
a^(sd  So)  =  (l  0)  (3a)  (sd  Sq)  =  (1  0)  (4a) 

^^min  (4h) 

where  (3)  and  (4)  depart  from  the  phased-array  and  the 
Capon's  beamformer  philosophy  respectively.  The 
corresponding  beamvectors  are  easily  derived  by  means  of 
the  Lagrange  multipliers.  For  its  simplicity,  we  pay 
special  attention  to  the  beamvector  that  is  derived  from  (3) 
and  formulated  in  (5). 

a  =  A  M^A ]-l  (1;0)  =  A*  (1;0)  (5) 

where  A  =  [sd  sq]  and  #  stands  for  the  pseudo-inverse. 
This  minimum  norm  beamvector  leads  to  minimum  loss  of 
desired  signal  response  if  the  coefficients  ai  are  achieved  by 
attenuation  and  to  smallest  sensitivity  to  errors  in 
construction.  Additionally,  its  design  is  completely  data 
free.  It  is  also  interesting  to  note  that  if  the  steered  direction 
Sd  is  the  same  as  the  desired  source  direction,  the 
beamvector  formulated  in  (5)  offers  the  Deterministic 
Maximum  Likelihood  estimate  of  the  signal  waveform  ^ 
coming  from  that  source. 

e  =  A*(l;0)x  (6) 

We  recall  the  importance  of  the  noise  bandwidth  (Bn) 
normalization  in  order  to  get  a  reliable  DOA  estimate  from 
the  spatial  power  density  in  (3)instead  of  directly  using 
the  spatial  power. 

4>  =  a^Ra/BN  (7) 

To  be  more  specific,  spatial  bandwidth  may  introduce 
substantial  power  leakage  from  sources  or  directional  noise 
impinging  on  the  aperture  from  other  directions  than  the 
desired  one. 

It  is  important  to  remark  the  robustness  and  low 
complexity  of  both  procedures  associated  with  the 
principle  of  the  beamvector  scanning.  The  only  problem 
they  face  is  spatial  frequency  leakage  or  resolution  loss  for 
the  multiple  source  case. 


Next,  we  will  propose  to  use  either  (3)  or  (4)  in  a  EM 
based  architecture  to  provide  a  high  resolution  tracker  yet 
preserving  the  low  complexity  and  robustness  previously 
mentioned. 

3.  The  EM-based  architecture 

After  a  detailed  exam  of  the  EM  algorithm  [1]  both  in 
the  deterministic  and  the  stochastic  approach,  the  Estimate 
step  can  be  viewed  as  a  blocking  step  where  the 
multiparameter  estimation  problem  is  reduced  to  a  single 
parameter  estimation.  Being  more  specific,  the  steps 
Estimate  and  Maximize,  when  implemented  in  a  signal 
processing  architecture  for  DOA  estimation,  can  be 
renamed  as  blocking  and  single  source  estimation 
respectively.  In  other  words,  given  the  original  data 
snapshot  Xn  containing  NS  sources,  the  blocking  step 
produces  NS  snapshots  Xn,k  (k=l,NS)  such  that  a  single 
source  is  relevant  in  every  snapshot  or,  at  least,  the  other 
sources  are  highly  attenuated  with  respect  to  this  source. 

In  consequence,  the  blocking  step  could  be 
implemented  as  NS  matrices  Bk  (k=l,NS)  that  produces 
from  xji  the  single  source  snapshot 

xn^k^BkXn  (k^lNS)  (8) 

As  the  blocking  step  requires  the  source  DOAs,  it  is 
necessary  to  feed  the  DOAs  obtained  in  the  second  stage 
back  to  die  first  or  blocking  stage.  This  is  the  other  main 
feature  of  the  EM  and  EM-based  algorithms.  As  the  reader 
can  observe  in  figure  2,  the  maximum  at  the  output  of 
each  branch  governs  the  nulls  of  the  other  branches  (i.e. 
cross-feedback).  This  fact  prevents  two  or  more  branches 
from  collapsing  into  the  same  angle  estimation. 


M - ► 

Blocking  Single  source 
DOA  etimation 

Figure  2.  The  EM-based  architecture  with  the 
blocking  stage  followed  by  single  source  DOA 
detectors 

The  resulting  architecture,  depicted  in  the  figure  2,  can 
be  found  in  detail  in  [3],  where  the  links  with  the 
deterministic  and  stochastic  EM  algorithm  are  presented  in 
full.  The  purpose  of  this  work  is  to  use  this  architecture 
together  with  the  beamforming  scanning  approach  briefly 
described  in  the  previous  section. 

From  now  on,  the  presentation  will  be  reduced  to  the 
two  source  case  for  a  linear  array.  As  the  reader  may 
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conclude  there  is  no  formal  difficulties  to  extend  the 
application  to  the  case  of  planar  arrays  or  to  the  case  of 
multiple  sources.  Nevertheless,  in  a  radio  conununication 
scenario,  the  probability  of  more  than  two  users 
demanding  Space  Diversity  Multiple  Access  (SDMA)  is 
very  low  being,  in  consequence,  the  case  of  two  active 
sources  the  closest  to  real  scenarios. 


Figure  3.  A  DOA  tracker  with  beamformer 
scanning  procedures  in  an  EM-based 
architecture. 

Finally,  and  going  back  to  the  architecture  of  figure  2, 
it  should  be  mentioned  that  both  steps  can  be  implemented 
as  a  single  one  when  the  procedures  shown  in  (3)  and  (4) 
are  used.  These  procedures  allow  the  packing  of  both  steps 
in  a  single  one;  since,  being  single  source  estimates,  they 
include  the  blocking  of  DOAs  a  priori  selected.  This 
proposed  architecture  is  the  one  depicted  in  figure  3. 

Next  section  will  explain  how  a  two  source  DOA 
detector/tracker  based  in  tlis  architecture  works. 

IV.  The  proposed  multiple  source 
tracker. 

IV.l.  A  DOA  tracker  architecture  and  algorithm 

In  the  EM-based  architecture  that  is  depicted  in  figure 

A  A 

3,  two  source  DOA's  are  produced:  0l,n-M  02,n-M* 

Initially,  the  data  correlation  matrix  Rn,  which  is  required 
to  compute  the  spatial  power  density  d>i,  is  initalized  with 
a  number  of  snapshots  equal  to  ten  times  the  number  of 
sensors  and  afterwards  this  matrix  is  updated  during  M 
snaphots  following  the  rule: 

Rn-M+k  =  pRn-M  +  (1-p)  ^n-M+k  ^n-M+k 

being  M  equal  to  1/1 -p.  This  interval  M  is  the  number  of 
samples  between  successive  updates  of  the  DOAs  provided 
by  the  system.  Its  choice  is  a  trade-off  between  radial 
source  velocity  and  scanning  time. 

To  make  the  system  robust  to  bad  initializations  in 
whatever  kind  of  scenario  (i.e.  very  different  powo"  sources 
and  even  presenting  strong  correlation),  initially,  just  one 
branch  sets  out  to  work.  Once,  this  branch  has  detected  one 
source  DOA,  this  DOA  can  then  drive  the  null  of  the 
second  branch.  In  this  way,  both  branches  cannot  collapse 
into  the  same  source  DOA.  If  both  DOA's  are  far  enough, 
both  branches  in  figure  3  can  then  begin  to  scan  parallelly. 
Next,  the  procedure  to  update  each  angle  estimate  is 


described. 

During  the  mentioned  M  snapshots  and  in  the  case  of 
the  data  independent  design  (see  3),  the  beamformers  are 
obtained  as  it  is  shown  in  (10)  for  the  beamforemer 
labelled  1  in  figure  3  and  in  the  same  manner  for  the 
beamformer  2 

afnlsiO)  S2(^,n)l=(l  0)  (10) 

^In  ^In^min 

Note  that  for  the  beamformer  labelled  1,  the  second 

A 

branch  drives  its  null  at  62,11  and  in  the  same  manner  for 
thebeamforma'2 

Once  the  beamformers  have  been  designed,  they  scan 
on  s  as; 

“in 

where  the  spatial  bandwidth  Bn  has  been  approximated  by 
the  norm  of  the  beamvector  ain  [3];  the  new  estimate  will 
be  the  DOA  that  maximizes  the  estimated  spatial  power 
density 

h,n  =  tnax^ln(e)  (12) 

We  remark  that,  in  order  to  save  in  time  and 
computational  burden  the  DOA's  that  are  scanned  on  s 

A 

can  be  close  to  the  previous  0i,n-M  •  However,  the 
system  is  no  more  a  detector  but  just  a  tracker. 

Note  that  simultaneously  to  the  acqmsition  period  M, 
the  architecture  may  iterate  over  (10)-(12)  in  the  same 
fashion  as  it  was  in  the  original  EM  algorithm.  It  should 
be  pointed  out  that  the  number  of  iterations,  to  be  useful, 
requires  a  high  precision  scanning  through  s  and  at  the  end 
may  face  the  upper  bound  in  resolving  two  close  sources 
from  a  M  interval  data  correlation  estimate.  At  least  two 
iterations  will  be  necessary  in  any  case. 

IV.2.  Tracking  subsystem 

The  concept  of  a  global  tracker  includes  not  only  the 
DOA  detection  scheme,  but  also  the  parameter  filtering 
which  enables  to  cope  with  eventual  fadings  of  boimded 
time  duration,  as  it  may  occur  in  crossing  radial 
trajectories  of  two  targets.  This  additional  processing  uses 
to  contain  two  additional  stages  of  time-trackers  of  each 
parameter  and  data  fusion  or,  in  some  cases,  image 
processing  of  the  two  image  produced  by  parameto  values 
versus  time. 

Most  of  the  cases,  the  DOA  detection  scheme  (we  just 
described  an  alternative  in  the  previous  section)  does  not 
take  profit  from  the  powerful  processing  that  follows  in 
forming  the  global  tracker  system.  We  comment  on  it  to 
state  that  the  blocking  plus  estimation  scheme  described 
can  take  a  great  advantage  of  the  tracker  subsystem.  We 
will  refer  hereafter  as  parameter  tracking,  since  most  of  the 
success  of  the  DOA  estimation  is  based  on  adequate 
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milling  or  inhibition  of  the  non-steered  sources. 

Once  the  detected  angle  has  stabilised  at  each  of  the 
branches,  an  elevation  tracker  is  used  in  the  scheme  of 
figure  3  (insertion  point  K).  In  this  way,  the  performance 
of  the  system  may  improve  since  tracking  and  prediction 
of  "next"  location  is  of  capital  importance  in  inhibition  or 
blocking. 

This  work  has  employed  the  most  basic  parameter 
tracker  that  can  be  used:  a  Kalman  filter  tracking  radial 

position  0ie,n-M  and  vie,n-M  velocity.  A  complete 
description  can  be  found  in  [4].  Next,  we  just  comment 
some  specific  aspects  on  the  state  equation  and  the 
measurement  equation.  The  state  model  is 


where  Wn  is  the  uncertainty  (associated  to  the 
maneuverability  of  the  sources)  with  covariance  matrix  Q. 
The  measurement  model  is  (14) 

n-  ^ien  +  v  n 

where  0i  n  is  the  estimate  produced  after  the  detection  and 
Vn  is  the  noise  in  the  observation  of  the  elevation  angle. 
This  noise,  of  covariance  Cyn.  is  due  to  air-interface,  down 
conversion  mismatching,  noise  and  DOA  detection  errors. 

Both  covariance  Q  and  Cy  have  to  be  matched  to  the 
specific  application.  In  our  work  we  have  set  Q  to  diagonal 
(  10''*  10'^  )  for  a  mobile  communication  scenario.  We 
have  commented  before  that,  simultaneously  to  the 
acquisition  period  M,  the  architecture  of  figure  2  may 
iterate  over  (10)-(12).  During  these  L  iterations,  each  of 
the  detectors  adjust  their  DOA  estimates  until  each  one 
stabilizes.  The,  the  Kalman  sub-system  filters  the  noise 

A 

out  of  these  estimates  and  produces  9ie  n  ■  Therefore,  in 
order  to  set  the  measurement  covariance  Cyn,  it  can  be 

estimated  as  in  (15),  that  is,  the  error  power  between 

A 

the  angle  predicted  by  Kalman  Oie  n  and  the  angle  detected 

A 

Bin  over L realizations 

=  (1-1/L)  I  ^  -  Oie  n  T/ 

k=L..L  (1^) 

V.  Simulations  and  Conclusions 

In  order  to  validate  the  proposed  DOA 
detection/tracking  technique  two  simulations  have  been 
conducted.  Figure  4  and  5  show  the  case  of  two  moving 
sources  tracked  by  the  system  of  figure  3,  where  the 
beamformers  ai  are  simple  phased-arrays  that  follow  the 
design  rule  of  (10)  and  where  the  Kalman  sub-system  is 
incorporated.  First,  figure  4  shows  the  performance  of  the 
system  in  a  scenario  of  two  sources  received  with  very 
different  powers:  15  and  5  dB  respectively.  The  uniform 
linear  array  consists  of  8  sensors  and  each  scan  is  carried 
out  after  30  snaphsots.  Radial  velocity  is  0.017snapshot 


and  an  initial  angle  estimate  of  20“  has  been  considered 
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Figure  4.  Two  sources  of  [15  5]  dB.  Tracking  by 
an  8  sensor  array.  Each  scan  consists  of  30 
snaphots. 

Next,  figure  5  is  carried  out  in  the  same  scenario  but 
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Figure  5.  Two  fully  coherent  sources  of  [15  5] 
dB.  Tracking  by  an  8  sensor  array.  Each  scan 
consists  of  30  snaphots. 

The  proposed  technique  offers  a  good  trade-off  between 
performance  against  complexity  and  cost.  Its  robustness  is 
associated  to  the  intrinsic  clarity  and  simplicity  of  the 
processing  scheme  and  the  DOA  algorithm  used  inside. 

Future  work  will  consider  the  impact  of  the  deviation 
in  element  locations,  mutual  coupling  an  quantization 
effects  in  using  digitally  controlled  attenuators  and  shifters. 
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Abstract 

In  several  adaptive  array  application  areas  the 
Gaussian  distribution  has  not  proven  to  he  an  accu¬ 
rate  model  of  the  measured  data.  Nevertheless^  Gaus¬ 
sian  based  processors  have  demonstrated  robust  perfor¬ 
mance  in  spite  of  this  statistical  mismatch.  A  need 
therefore  exists  for  the  consideration  of  (i)  problem 
reformulation  and  (ii)  performance  analysis  in  non- 
Gaussian  environments.  The  theory  of  complex  multi¬ 
variate  elliptically  contoured  (MEG)  distributions  pro¬ 
vides  an  attractive  theoretic  framework  for  these  con¬ 
siderations  especially  in  the  adaptive  array  setting.  We 
replace  the  Gaussian  data  assumption  with  one  of  MEG 
distributed  and  reexamine  the  optimality  and  perfor¬ 
mance  of  widely  used  adaptive  detection  and  beamform¬ 
ing  structures. 

I.  Introduction 

IN  several  radar/sonar  and  other  array  application 
areas  the  Gaussian  distribution  is  an  inadequate 
model  for  measured  data  (see  [5]  and  its  bibliogra¬ 
phy).  Thus,  extending  classical  array  processing  to 
non-Gaussian  distributions  has  been  a  longstanding 
desire  of  the  array  community.  In  addition  the  ob¬ 
served  robustness  of  Gaussian  based  processors  in  some 
non-Gaussian  environments  motivates  the  need  for  sta¬ 
tistical  performance  analyses  of  these  processors  which 
address  in  a  relatively  general  sense  a  plurality  of  pos¬ 
sible  non-Gaussian  distributions.  Such  analyses  are  of 
special  interest  in  adaptive  array  scenarios  which  of¬ 
ten  involve  estimation  of  the  data  covariance  via  the 
sample  covariance  matrix  (SCM).  The  theory  of  com¬ 
plex  multivariate  elliptically  contoured  (MEC)  distri¬ 
butions  provides  one  such  vehicle  to  perform  such  anal¬ 
yses.  MEC  distributions  represent  a  fairly  attractive 
set  of  data  models  for  the  adaptive  array  scenario  for 
several  reasons:  (1)  similar  models  have  had  success 
historically  and  contemporarily  speaking  under  the 
guise  of  spherically  invariant  random  vectors  (SIRVs  or 
processes  SIRPs)  and  Gaussian  mixtures  [5],  (2)  MECs 
provide  a  theoretic  framework  which  (i)  allows  one  to 


address  a  plurality  of  non-Gaussian  distributions  si¬ 
multaneously,  and  (ii)  provides  for  optimal  estimation 
of  the  (typically  unknown)  data  covariance,  (3)  MECs 
often  allow  for  tractable  performance  analyses  in  the 
presence  of  the  SCM,  which  has  historically  been  a 
limiting  factor  when  deviating  from  the  Gaussian  as¬ 
sumption. 

In  this  paper  we  replace  the  classic  assumption  of 
data  normality  with  one  of  MEC  distributed  data  and 
reexamine  important  Gaussian  based  results  of  adap¬ 
tive  array  detection  and  signal  estimation.  In  partic¬ 
ular,  Kelly’s  generalized  likelihood  ratio  test  (GLRT) 
[1]  and  Robey’s  adaptive  matched  filter  (AMF)  [2]  are 
shown  to  be  detection  structures  which  arise  not  neces¬ 
sarily  from  Gaussianity  per  se,  but  rather  as  by  prod¬ 
ucts  of  the  elliptical  symmetry  which  the  Gaussian 
happens  to  possess.  Indeed,  it  is  shown  that  a  large 
class  of  MEC  distributions  lead  to  the  same  detec¬ 
tion  structures.  The  probability  of  false  alarm  (PFA) 
and  constant  false  alarm  rate  (CFAR)  loss  relative  to 
the  complex  Gaussian  of  known  covariance  are  shown 
invariant  over  the  MEC  class.  Concerning  adaptive 
beamforming,  exact  statistical  analyses  of  the  sample 
covariance  based  (SCB)  linearly  constrained  minimum 
variance  (LCMV)  beamformer  and  its  SCB  general¬ 
ized  sidelobe  cancellor  (GSC)  implementation,  which 
include  pdfs  for  their  weightings,  beam  responses,  and 
beamformer  outputs  are  given.  All  results  suggest  sig¬ 
nificant  robustness  implications  to  adaptive  array  pro¬ 
cessing  in  non-Gaussian  environments. 

II.  Array  Processing 

In  array  processing  the  multisensor  array  data  is  of¬ 
ten  modeled  by  the  following  vector  observation  (all 
vectors  are  complex) 

E 

X(Arxi)  =  G(7Vx£;)S(£;xi)  +  ^{Nxi)  =  ^  +  n.  (1) 

The  dimensions  of  the  corresponding  matrices  are  in¬ 
dicated  in  subscript,  x  is  the  received  array  data 
with  covariance  R,  containing  the  desired  signal  vector 
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s  =  [5i,  52,  •  •  • )  SeV  denotes  matrix  transpose  and 
H  the  conjugate  transpose),  n  is  additive  noise.  The 
columns  of  the  matrix  G  =  [gilg2i  ■  •  •  Igfi]  model  the 
system  transfer  functions,  also  known  as  the  steering 
vector(s).  It  is  assumed  that  G  is  full  column  rank  and 
known  exactly. 

In  this  paper  we  consider  the  problems  of  adaptive 
array  detection  j estimation  and  adaptive  beamforming 
in  the  specific  class  of  MEC  non-Gaussian  environ¬ 
ments. 

III.  MEC  Distributed  Data 

Consider  the  N  x  (L  -b  1)  data  matrix  Xq  — 
[xi|x2|---|xil^]  =  which  has  as  its  first  L 

columns  training  data  and  its  last  column  the  pri¬ 
mary  array  snapshot  under  interrogation.  Tradition¬ 
ally  the  columns  of  Xo  are  assumed  independent  Gaus- 
sianj  however,  the  Gaussian  distribution  is  one  member 
of  a  broad  class  of  distributions  known  as  elliptically 
contoured  (EC)  distributions,  which  likewise  often  al¬ 
low  for  tractable  analysis.  Multivariate  EC  distribu¬ 
tions  extend  classical  Gaussian  based  sampling  the¬ 
ory  of  multivariate  statistical  analysis  to  the  case  of 
observations  which  are  dependent  and/or  drawn  from 
nonnormal  populations  [10],  [11].  Note  the  following 
definitions: 

Definition  1.-  If  a  complex  random  vector  hyvxi  has 
a  cheiracteristic  function  (c.f.)  of  the  form 

£;{giRe(t"h)  j  _  exp[jRe(t"m)]  •  (p{tQ  Rto)  (2) 

where  E{-}  is  the  expectation,  misIVxl,RisiVxiV 
positive  semi-definite  (>  0),  Rg(‘)  3'iid  Im(‘)  denote 
real  and  imaginary  parts,  we  say  that  h  is  complex  EC 
distributed  with  parameters  m,R,  (p,  and  we  denote 
this  by 

h  ~  C5CN(ni>R><A)-  (^) 

If  the  density  function  of  h  exists  (is  nondegenerate) 
then  it  necessarily  has  the  form 

|R|-i  g[{h  -  m)"R“^  (h  -  m)]  (4) 

where  j  •  |  denotes  the  matrix  determinant. 

Definition  2:  A  multivariate  EC  (MEC)  distribution 
generalizes  the  vector  EC  to  the  case  of  a  matrix,  and  is 
similarly  defined.  An  NxL  matrix  H  =  [hi  |h2 [  •  •  •  [hr,] 
whose  c.f.  has  the  form 

E  (exp{jRe[tr(T^H)]})  =  exp  (yRe[tr(T"M)])  (5) 

X(/.(tfRiti-btfR2t2  +  ---  +  tfRx,ti,)  (6) 

where  tr(-)  denotes  the  matrix  trace.  To  = 
[ti|t2|  •  •  •  jti,]  is  N  X  L,  amd  Ri  >  0  for  z  =  1,2, .  ..,L, 


is  said  to  be  complex  MEC  and  we  write 

H~CA45CArxL(M;Ri,R2,---,R-i.;'A)-  (7) 


If  the  density  of  H  exists,  then  it  has  the  form 


r  L  1 

“  L 

7 

xg 

^trR"^(hi  -  mi) (hi  -  mi)" 

.  *=1 

Definition  3:  When  M  =  [m|m|  •  •  •  |m],  and  Ri  - 
R2  =  •  •  •  =  Rl  =  R  in  definition  2,  then,  say  H,  is 
said  to  be  complex  Louiville  EC  (LEC)  distributed  and 
we  write 

H~C£5Civxi,(m,R,<?i).  (8) 

The  functional  form  of  </>(•)  uniquely  determines 
g{-)  and  distinguishes  one  type  of  EC /MEC  distribu¬ 
tion  from  another.  For  example,  if  0(u)  =  e““  then 
g[u)  (X  e~“  and  H,  for  example,  is  a  complex  Gaus¬ 
sian  data  matrix  with  columns  independent  identically 
distributed  as  CA/’(0,R). 

IV.  Adaptive  Array  Detection 

In  adaptive  array  detection,  signal  presence  is  sought 
in  the  single  vector  snapshot  x  called  the  primary  data 
vector.  One  of  the  two  following  hypotheses  is  true. 

jEfo  :  X  =  n,  or  ifi  :  x  =  Gs  +  n;  (9) 

either  the  primary  data  vector  is  simply  noise  only  (un¬ 
der  the  null  hypothesis  Hq),  or  it  contains  a  target  sig¬ 
nal  plus  noise  (under  hypothesis  Hi).  It  is  assumed 
that  s  and  R  are  unknown  and  that  a  secondary  data 
set  (or  training  set)  X  =  [xi  |  •  •  •  Ixx,]  is  available  to 
help  compensate  for  ignorance  of  these  nuisance  pa¬ 
rameters.  The  target  free  snapshots  Xj  are  zero  mean 
and  share  the  same  covariance  as  the  primary  data 
vector,  i.e.  cov(Xi)  =  R  for  i  =  The  deci¬ 

sion  about  signaJ  presence  is  based  on  the  totality  of 
the  data  summarized  by  the  iV  x  (L  -h  1)  data  matrix 
Xo  =  [X|x]. 

A.  The  GLRT  and  AMF  Detectors 

Under  both  hypotheses  and  throughout  the  remainder 
of  this  paper  we  assume  that  L>  N  and  that  the  data 
matrix  Xq  is  MEC  distributed  with  a  density  of  the 
form 

=  |Rr<''+'>fl[  trR-i(Xo  -  Mi)(Xo  -  Mi)"  ]  (10) 

z  =  0, 1,  where  under  the  corresponding  hypotheses  we 
have 

Ho:Mo  =  ffi  :  Ml  =  [0Arxi,|  Gs  ].  (11) 
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(Note  that  this  data  model  includes  the  complex  Gaus¬ 
sian  considered  by  Kelly  [1]  when  we  choose  p(r)  = 
e-r^-iV(L+i)  )  Assuming  this  data  distribution,  we 
follow  two  theoretic/heuristic  approaches  to  the  adap¬ 
tive  array  detection  problem;  namely,  (i)  that  outlined 
by  the  GLRT  procedure,  and  (ii)  the  heuristic  AMF 
approach. 

B,  Key  Results  on  Detection 

•  (1)  Taking  the  same  GLRT  approach  as  Kelly  [1] 
we  find  that  the  resulting  GLRT  decision  statistic 
is  mathematically  unchanged  under  this  class  of 
complex  MEC  distributions. 

•  (2)  Following  the  heuristic  AMF  approach  [2]  like¬ 
wise  leads  to  the  same  detection  structure  one 
would  obtain  under  the  data  Gaussianity  assump¬ 
tion. 

•  (3)  For  both  the  GLRT  and  AMF  the  PFA  in  in¬ 
variant  over  the  complex  MEC  class,  i.e.  it  does 
not  depend  on  ^(').  Hence,  the  CFAR  loss  relative 
to  the  Gaussian  is  the  same. 

•  (4)  For  both  detectors  the  PD  is  dependent  on  the 
functional  form  of  the  density  given  by  ^(•). 

•  (5)  The  SCB  LCMV  beamformer  remains  the 
maximum-likelihood  (ML)  estimate  of  the  signal 
parameters  s.  (See  [6],  [8]). 

V.  Adaptive  Beamforming 

A.  Clairvoyant  Beamformer  Weightings 

If  R  is  known  exactly,  then  under  the  Gaussian  as¬ 
sumption  the  signal  estimates  and  beamformer  outputs 
are  respectively  of  the  form 

s  =  W^x  and  y  =  w^x  (12) 

[9].  The  specific  weightings  performing  the  linear 
transformations  on  the  data  x  for  the  clairvoyant  (R 
known  )  array  processors  considered  in  this  paper  axe 
summarized  by 


Clairvoyant  Weightings 

ML: 

LCMV: 

GSC: 

Wml  =  R-iG(G"R-iG)-i 
'^LCMV  =  ^ ML^  (also  MVDR) 

Wg»c  =  Wcscf 

^Gsc  =  [Iat  -  Gxf2Gsc]  G(G^G)~^ 
nG5C  =  (G"RGx)-^G"  R. 

estimated  from  a  secondary  data  set.  The  common 
heuristic  procedure  is  simplj^  to  replace  R  with  the 
SCM,  which  we  denote  by  R.  This  method  leads  to 
the  following  SCB  processors: 


SCB  W eightings 

ML: 

Wml  =  R-iG(G"R-iG)-i 

LCMV: 

^LCMV^  WMLf  (also  MVDR) 

GSC: 

Wp,c  =  WG5cf 

Wgsc  =  [liv  -  GxflGSc]  G(G"G)-1 
nG5C  =  (G"RGx)-'G"  R. 

The  hat  “  ”  accent  is  used  to  denote  the  SCM  as 

well  as  the  dependence  of  the  weightings  and  other 
quantities  on  the  SCM  via  the  above  heuristic  proce¬ 
dure.  Although  originally  a  heuristic  procedure,  we 
show  that  these  SCB  beamformers  are  optimal  in  the 
ML  sense  under  the  MEC  distributed  assumption  for 
Xo  given  by  eq(lO)  when  maximizing  over  both  pa¬ 
rameters  s  and  R  [6]. 

C.  Key  Results  on  Beamforming 

C-1  SCB  Weightings 

The  key  result  is  a  unified  stochastic  representation  of 
the  SCB  weightings.  All  of  these  weightings  can  be 
written  equal  in  distribution  to  their  clairvoyant  coun¬ 
terparts  plus  a  stochastic  term: 


where  A,  B,  C  and  d  are  deterministic  matrices/ vector 
and  T  is  an  (A^  —  E)  X  E  random  matrix  with  pdf 


(14) 


a  standardized  complex  multivariate  i-distribution.  k 
is  the  normalizing  constant  of  the  pdf.  This  stochastic 
representation  is  completely  independent  of  the  func¬ 
tional  form  of  g  (or  0),  and  is  used  to  derive  exact 
means,  covariances,  and  pdfs  for  the  beam  responses, 
beamformer  outputs,  and  signal  estimates  which  result 
from  these  SCB  weightings  [7],  [8]. 

C.2  SCB  Beam  Responses 

All  SCB  beam  responses  h{6^uj)  =  W^d0(a;)  or 


B,  SCB  Beamformer  Weightings 

Typically  R  is  an  unknown  parameter  which  must  be 
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b{9,u})  -  w"d9(w),  can  also  be  written  equal  in  dis¬ 
tribution  to  their  clairvoyant  counterparts  plus  a  noise 
term: 


where  Ao  and  Bo  are  deterministic  and  t  (£?  x  1)  and  t 
are  random  with  pdfs  that  are  special  cases  of  eq(14). 
This  representation  is  likewise  invariant  over  the  class 
of  complex  MECs  considered. 

C.3  Beamformer  Outputs 

The  SCB  signal  estimates  s  =  W^x  and  SCB  beam- 
former  outputs  y  =  w"x  respectively  can  be  written 
equal  in  distribution  to: 


where  K  and  K  are  deterministic  quantities,  Zg  {E  x  1) 
and  Zg  are  complex  spherically  symmetric  noise  terms 
whose  pdfs  depend  on  the  functional  form  of  ff(-). 
and  P  is  complex  beta  distributed  with  parameters 
I,  —  +  E  +  \  and  N  —  E  which  is  independent  of 

both  Zg  and  Zg  [8]. 
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Abstract 

We  present  a  new  class  of  discrete  chaotic  systems  (i.e. 
chaotic  maps)  that  can  effectively  encrypt  information. 
The  nonlinearity  of  these  systems  is  achieved  by 
designing  proper  piecewise  linear  functions  and  by 
using  modulo  operations.  The  chaotic  maps  are  used  as 
pseudo-noise  generators  and  as  the  synchronization 
mechanism  of  a  secure  spread-spectrum  communication 
system  design.  The  potential  for  automatic 
synchronization,  the  lack  of  periodicity  and  the 
extremely  large  parameter  spaces  that  our  chaotic  maps 
exhibit  offer  great  advantages  over  the  traditional 
Linear  Feedback  Shift  Registers  pseudo-noise 
generators  for  spread  spectrum  system  design. 

1  Introduction 

The  paper  presents  encryption  algorithms  that  operate  by 
restricting  the  system's  parameters  in  ranges  that  guarantee 
geometric  convergence  of  all  the  variables  of  the  receiving 
system  to  those  of  the  transmitting  one  under  the  influence 
of  a  common  transmitted  variable  (called  the  drive  signal). 
This  process  has  only  few  features  in  common  with  the 
chaotic  synchronization  process  [1,2,5,6,9,10]  studied 
extensively  for  flows  (i.e.  systems  of  differential 
equations).  The  similarity  is  the  utilization  of  one  (or  more) 
variable(s),  constituting  the  drive  signal,  which  is  used  in 
order  to  entrain  the  receiver’s  system  to  the  transmitter. 
However,  the  design  of  the  presented  discrete  chaotic 
systems  is  completely  different. 

The  improvements  we  introduce  over  previous  related 
work  [1,3,6, 1 1]  are  threefold: 

1.  We  present  a  methodology  based  on  general  piecewise 
linear  functions  that  exhibit  chaotic  dynamics.  These 
functions  are  easily  implementable  with  the  available 
electronic  technology  [4].  They  exploit  the  modulo 
operator  in  order  to  achieve  both  bounded  chaotic 
evolution  and  extreme  sensitivity  on  the  initial  conditions. 
Due  to  the  modulo  operation  both  the  range  of  parameter 
space  over  which  the  evolution  is  chaotic  and  the 
sensitivity  of  the  system  to  parameter  variations  are 
dramatically  increased.  The  consequence  is  an  immense 
parameter  space  even  for  small  encryption  systems. 

2.  We  present  a  general  form  for  our  chaotic  enciphering 
systems  and  we  establish  systematically  a  set  of 
convergence  conditions  on  the  variables  of  these  systems. 

3.  Although  the  presented  chaotic  systems  offer  very 
strong  encryption  security  and  the  possibility  to  encrypt 
bulk  data  (e.g.  video  data),  at  fast  real-time  rates,  they  are 
very  sensitive  to  transmission  noise.  We  combine  the 
scheme  presented  in  [11]  with  our  chaotic  enciphering 
systems  and  obtain  the  design  of  a  secure  spread  spectrum 
communication  system  that  can  operate  reliably  even  in  the 


presence  of  a  strongly  noisy  background.  The  designed 
robust  communication  systems  offer  high  security, 
automatic  and  robust  synchronization  between  the 
transmitting  and  receiving  spreading  sequences  and 
toler^ce  to  intense  noise  levels.  In  contrast  to  the 
traditional  spread  spectrum  techniques,  the  security  of  the 
system  arises  mainly  from  the  inability  to  synchronize 
without  the  possesion  of  the  encryption  key  and  secondly 
from  the  spreading  of  the  spectrum.  The  chaotic 
enciphering  systems  can  be  implemented  efficiently  by 
exploring  the  parallelism  of  the  computational  operations 
with  dedicated  array  hardware  [13]. 

The  paper  proceeds  by  presenting  the  chaotic 
encryption/decryption  method  in  Section  2.  Section  3  deals 
with  the  transmission  of  digital  information  over  channels 
with  strong  noise  background.  Section  4  discusses  the 
complexity  of  the  encryption  method,  while  in  section  5  the 
conclusions  are  presented  along  with  directions  for  future 
work. 

2  The  Chaotic  Enciphering  System 

The  transmitter  encodes  the  information  by  implementing 
the  following  system  of  difference  equations: 

Xi(«+l)  =  /i(x,(^2))+£-5(/2) 

K 

X2  («  +  1)  =  /2  (X3  («))  +  ^  a2kXf,  {n)  +  C2X2  («)  sin(^/2^3  («)) 
k^\ 

K 

X/  («  +  1)  =  /  (X/+1  («))  +  ai2^2  («)  +  X  ^ik^k  («)  ( 1 ) 

k=iM2 

+  CiXi^^  (n)  sin(£/yxy+i  («)),  /  =  3, . . . ,  A:  - 1 
X/-  («+ 1)  =  a  (n)+a  ^2:^2  W  +  («)  sin(^A:^2  (")) 

where  c^,  ,  /  =  2,...,A:,y  =  i, are  constants,  s(n)  is 

the  information  signal  and  X2(«)  the  drive  signal,  i.e.  the 
signal  which  is  transmitted  in  order  to  force  the 
synchronization. 

The  values  of  these  parameters  together  with  the  values  of 
the  parameters  of  the  functions  /  =  1, . . . ,  AT  - 1  form  the 
encryption  key.  The  equation  for  x\{n)  adds  the  signal 
s{n)  to  the  chaotically  evolving  variable  x\{n) . 

It  is  important  to  stress  beforehand  that  in  contrast  to  the 
traditional  encryption  methods  the  information  signal  is  not 
transmitted  in  an  encrypted  form;  rather  it  is  reconstructed 
by  the  variables  of  the  proper  chaotic  system  at  the 
receiver.  The  above  statement  formally  means  that  we 
cannot  express  the  encoded  information  C  as  a  function  of 
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the  encryption  key  K  and  the  information  I ,  i.e. 
C  =  F(K,I).  Given  an  information  vector  I  and  an 
encryption  key  K ,  the  ciphertext  C  can  take  an  infinity  of 
possible  values  (due  to  the  non-periodicity  of  the  chaotic 
motion). 

The  parameters  of  the  piecewise-linear  functions  /,  are 
evaluated  modulo  /?,•  i.e.  x/(«)  =  Xj(«)(mod  Rj)  +  Lj, 
where  Rj^Ui-Lj  and  is  the  domain  of  definition 

of  the  function  f .  The  effect  of  this  rule  is  to  limit  the 
evolution  of  each  function  within  its  domain.  The  detailed 
expressions  of  the  piecewise  linear  canonical  form  and  the 
analysis  of  the  chaotic  dynamics  of  the  system  (1)  is 
presented  in  [12,13]. 

The  receiver  extracts  the  information  {information 
reconstruction)  by  implementing  the  following  system 
which  is  very  similar  to  (1): 

X'^in  +  1)  =  ajm.x(.(n)  +  +  c^x5(n)sin(rf^.i;(n)) 

x;{n  +  1)  =  fiixUM)  +  a,2Xi(/i)  +  ’Z'^ii.x'An) 

+  c,x;,|(/i)sin(^/iX'^,(M)),  A:-l,...,3  (2) 

(x[(M  +  1)  -  Mx^(n))  -  'Za,,x',(r,)  -  c,x[{n)sm{fx^{n))) 

X  _ _ _ - 

x{irt)  - - — 

rin)  =  (x;(^7  +  1)  -  fi{x[{n)))lz 

where  X2in)  (since  X2(n)  is  the  transmitted 

signal  that  also  functions  as  the  drive  [6]  signal)  and  r(n) 

is  the  recovered  information  signal. 

Convergence  Conditions  We  prove  that  the  information 
signal  can  be  perfectly  recovered,  when  a  set  of 
convergence  conditions  on  the  parameters  is  satisfied.  Let 
y2(w)  =  JC2(«)  be  the  common  drive  signal.  We  subtract 
the  equations  for  the  Kxh  variable  and  Xp  of  the 
systems  (1)  and  (2)  to  get: 
Axj,^(«  +  l)  =  (a/^  +cj^sin(Jj^X2(«)))-Ax/-(rt) .  Clearly, 
a  sufficient  condition  for  ^  is:  (|a  |+|  I )  <  1 . 

Similarly,  the  conditions  (|ot^;;,|+|c^|)<l  for 
can  be  imposed  as  sufficient 
conditions  for  the  convergence  of  variables 
jc/^_l,Xj^_2,...,^3  respectively.  Finally,  by  the  equations 
for  the  drive  variable  X2(«)  of  both  systems  (1)  and  (2)  we 
find:  x{(«)->  xi(«),  «  oo  .  Now,  it  is  straightforward 
to  conclude  that  r(«)  .  Thus,  the  information  is 

reconstructed  at  the  receiver. 

Multiple  time-lags  The  recurrent  dependence  can  be 
easily  extended  to  M>\  time  lags.  By  taking  (again)  the 
differences  between  (1)  and  (2)  we  derive  the  recurrence 
M-l 

Axy(w  +  1)=  ^a,v„,Ax,(/i-/w),  /  =  3,...,A:  (3) 

This  recurrence^  is  stable  if  and  only  if  all  the  roots 
p  ^  y  =  0, . . .  A/  1  of  its  characteristic  equation 

M-\ 

pM+i  _  =  0 ,  have  modulus  less  than  unity. 

Practically,  we  design  stable  (i.e.  able  to  synchronize) 
systems  by  selecting  M  values  p,,  /  =  0,...,M-1 ,  such  that 
|p^|<l.  TTien  we  determine  the  coefficients  of  the 
characteristic  polynomial  with  roots  p  / ,  /  =  0, . . , ,  M  - 1 . 


These  coefficients  yield  the  appropriate  values  for  the 
parameters  m  =  A^-1 . 

3  Noise  Tolerance 

The  presented  chaotic  encryption  systems  offer  great 
security  levels  (encryption  complexity  is  discussed  in 
Section  4).  They  are,  however,  very  sensitive  to  distortion 
by  transmission  noise.  Some  schemes  for  noise  robust 
chaotic  modulation  have  been  proposed;  notably  by  3 
method  presented  in  [11].  These  schemes  allow  reliable 
transmission  of  information  over  channels  that  exhibit  large 
noise  levels  (even  with  negative  Signal  to  Noise  Ratio). 
However,  they  require  accurate  synchronization  be^een 
the  spreading  sequences  and,  in  case  synchronization  is 
lost  (even  temporarily),  the  communication  fails.  We  apply 
the  presented  chaotic  enciphering  maps  -  of  the  type  of 
equation  (1)  -  in  order  to  automatically  keep  intact  the 
synchronization  between  the  chaotic  spreading  sequences 
of  the  receiver  and  the  transmitter.  Moreover,  the  presented 
design  requires  the  possession  of  the  encryption  key  for  the 
message  retrieval.  The  extremely  large  parameter  spaces 
that  discrete  chaotic  enciphering  systems  exhibit  also 
guarantees  high  level  of  security. 

The  proposed  method  operates  as  follows.  The  transmitter 
and  the  receiver  implement  the  chaotic  encryption  systems 
(1)  and  (2)  respectively.  The  variable  X2(«)  is  used  as  the 
drive  signal  which  synchronizes  the  receiving  system.  The 
spreading  of  the  information  signal  for  noise  tolerance  is 
achieved  by  using  any  of  the  chaotically  evolving  variables, 
e.g.  xi(n).  The  two  systems  (labeled  by  S\  and  ^2  in 
Figure  1)  can  synchronize  and  thus  generate  the  same 
chaotic  sequence  Xi(«)  (i.e  Xj(«)^X|(«)  where  x{(a2) 
is  the  reconstructed  signal  Xj(«)  at  the  receiver). 

We  should  note  here  that  the  synchronization  information 
(i.e.  the  drive  signal)  is  transmitted  reliably  as  Ae 
information  signal  itself.  In  order  to  achieve  this  objective 
we  use  multiple  (instead  of  one)  chaotic  time-senes 
generators  for  information  spreading.  These  generators  are 
controlled  by  the  main  chaotic  systems  that  are  capable  of 
achieving  synchronization  (i.e.  Si  and  S2  )•  Figure  1 
illustrates  the  cluster  of  K  systems  used  for  chaotic 
spreading  (spreading  systems)  and  the  main  chaotic 
systems  (synchronizing  or  entraining  systems).  The  design 
operates  as  follows: 

Each  sample  X|  (a?)  is  fed  into  each  of  the  K  spreading 
chaotic  time  series  generators.  These  generators  in  turn 
produce  K-  samples  (the  parameter  controls  the 
window  size  of  the  evolution  of  the  spreading  systems  after 
the  resetting  with  the  value  Xi(n)  from  the  entraining 
system).  The  generator  n,  «  =  1 ,  provides  to  the 

vector  w  the  values: 

w{nNy^,+j),  j  = 

where  w(nN^)  =  xi(nl  for  each  n.  These  values  (i.e. 
winN^  +  7),  y  =  1, . . . ,  - 1 )  are  generated  by  the  evolution 
of  the  nth  chaotic  system  initialized  to  X]  («) .  We  refer  to 
these  generators  as  the  spreading  systems. 

We  use  many  spreading  systems  instead  of  one  since  the 
exponential  divergence  of  nearby  trajectories  that  the 
chaotic  systems  exhibit  prevents  the  use  of  large  parameter 
(since  the  very  small  differences  between  the 
entraining  systems  are  enlarged  in  a  few  iterations). 
Moreover,  in  order  to  keep  intact  the  pseudo-noise 
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properties  of  the  spreading  sequence  we  require  for  the 
evolution  of  the  spreading  systems  (in  addition  to  the 
entraining)  to  be  chaotic. 

Following  the  approach  of  [1 1]  each  bit  is 

encoded  by  using  elements  of  the  spreading  chaotic 
time  series  /?  =  (A:  -  1) +  1, . . . ,  kN ^  -  The  parameter 
,  called  the  spreading  parameter,  controls  the  spreading 
of  the  information  over  the  transmitted  signal.  \^en  the 
transmitted  bit  is  5(A:)  =  +1  we  transmit  unaltered  the 
values,  i.e.  w'(«)  =  h<«),  /?  =  (A:  -  1)A^j  + 1, 

while  if  ^(A:)  =  ~l  we  transmit  >v'(«)  = -w(«), 

i.e.  the  negatives  of  the 
computed  values.  The  drive  signal  X2  is  converted  to  a  bit 
stream  X2b  it  is  transmitted  with  the  same  method.  The 
receiver  accumulates  the  reconstructed  synchronization  bit 
stream  in  order  to  build  the  reconstructed  drive  signal 
X2 .  These  conversions  can  be  accomplished  easily  with  the 
use  of  UARTs  (Universal  Asynchronous  Receiver 
Transmitter)  chips. 

In  order  to  proceed  with  an  analysis  of  the  tolerance  to 
noise  of  this  transmission  scheme  we  denote  the  TV- 
dimensional  vectors  w  and  as: 

w(A)  =  [>v((A  -  \)N,  + 1), . . . ,  w(AA^,.)], 

Nv(/:)  =  [v((A-I)TV,  +  1),...,v(AA,)] 

where  v(/)  denotes  the  noise  component  added  to  the 
value  w(/)« 

Since  noise  affects  the  quality  of  synchronization,  the 
reconstructed  signal  can  be  expressed  as: 

w^=w  +  Aw,  where  Aw  is  the  deviation  from  the 
synchronization  state.  The  correlation  sum  Sf^(k)  can  be 
expanded  by  introducing  the  inner  products: 
kN, 

(^)  =  X  ‘  («)  = 

kNs 

-  XW^)-h<'J)  + A^v(«))  •(><«) +  Ah</7))  (4) 

n={k-l)N^+l 

=  s(k){yv(kl  w(k))  +  (N^  (^),  w(A:)) 

+  sik){yv(kl  Ayv(k))  +  (N^  (k),  Aw(^)) 

As  the  spreading  parameter  increases,  the  probability  that 
the  noise  vector  Nv(A)  has  strong  components  in  the 
direction  of  w(A:)  reduces  rapidly.  Thus 
|(N^,(A),w(A))|«(w(A),w(A))  for  large  spreading 
parameter.  On  the  other  hand,  the  inner  product 
(Ny(^),  w(A:)) ,  for  large  Signal  to  Noise  Ratio  (SNR), 
takes  much  smaller  values  than  <  w(A:),  w(^)  > ,  but  for 
small  SNR  it  is  necessary  to  use  large  spreading  parameters 
in  order  to  obtain  reliable  transmission.  The  terms 
(Ny(^),Aw(A:)),  (w(^),  Aw(A:))  obey  similar  rules,  since 
both  Nv(^)  and  Aw(i)  are  random  vectors.  These 
theroretical  conclusions  are  supported  by  the  simulation 
results  of  Figure  2. 


Therefore  the  correlation  sum  S^\(k)  in  (4),  for 
sufficiently  large  spreading  parameter,  can  be  approximated 
by: 

Syv(A)«5(*)(w(A),wW) 

Since  (w(A),  w(i))  >  0 ,  the  sign  of  Sf^(k)  determines 
the  transmitted  information  bit. 


entraining  systerp 

5, 


HK-1 


I _ I 

spreading  systems 


W  =  [WiW2] 
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W,X2,W2S 

+  ^ - N. 
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Figure  1  The  architecture  of  the  extended  secure 
communication  system 
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Figure  2  The  probability  of  transmission  error  (vertical 
axis)  as  a  function  of  the  Signal  to  Noise  Ratio  (horizontal 
axis,  dB  units)  for  some  values  of  the  spreading  parameter. 
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4  Complexity  of  the  Enciphering  Systems 


We  have  already  presented  the  detailed  analysis  of  the 
fifth-order  chaotic  system  in  [13].  Therein  we  prove  that 
the  size  of  the  parameter  space  is  of  the  order 

K,  where  is  a  large  constant.  By  increasing 
the  complexity  of  the  piecewise  linear  functions,  we  can 
increase  even  further  this  huge  number  at  the  cost  of 
more  complex  implementation.  Generally,  for  a  [KM] 
chaotic  enciphering  system  (i.e,  K  variables  and  M 
lags),  the  parameter  space  size  grows  approximately 
with  an  expression,  derived  analytically  in  [13]: 


d 


^nv 


) 


-  UV  '  1=1 

where  Rav^^av  ^^e  the  average  range  and  sensitivity  of 
parameters,  Nj^  is  the  number  of  piecewise  linear 
functions,  the  number  of  breakpoints  of  the 


piecewise  linear  function  // ,  the  domain  of 

definition  of  ,  and  are  the  sensitivities  over 

the  breakpoint  position  and  breakpoint  value 
respectively.  Some  typical  values  for  these  parameters 
are: 

AE,-  «  A/;  «  1 0^ ,  5^  «  33  •  1 0“* ,  7?^  «=  2 
Nf,^  =20,Nf  =K-\  =  4,M  =  3,Df^  =100 


With  the  above  values  the  parameter  space  size 

becomes  of  the  order  0(1.1  -10^^® ) ! 

It  is  straightforward  to  conclude  that  the  parameter  space 
size  grows  doubly  exponentially  both  with  the  order  [KM] 
of  the  chaotic  system  and  exponentially  with  the 
complexity  of  the  piecewise  linear  functions  (note  that  the 
exponentation  with  N^i  is  multiplied  N j  times). 

This  double  exponential  complexity  should  be  contrasted 
with  the  exponential  ones  for  the  more  common  encryption 

algorithms  in  use.  Specifically,  2^^  operations  required  to 
break  the  DES  (Data  Encryption  Standard)  algorithm  and 

2*^^  operations  for  the  newer  IDEA  block  cipher  system 
[8]- 

The  Chaotic  Enciphering  Systems  can  be  implemented 
easily  with  simple  hardware  (e.g.  current  mode  techniques 
[4]).  The  computational  complexity  of  the  presented  coding 
algorithms  is  limited  to  a  few  additions  and  multiplications. 
Specifically,  in  order  to  evaluate  a  piecewise  linear  function 
we  have  to  evaluate  only  n+1  multiplications  and  2*^  +  2 
additions  (where  n  is  die  number  of  breakpoints  of  these 
ftinctions).  Furthermore,  with  the  exploitation  of  parallel 
hardware,  log2W-}-2  levels  of  gate  delays  are  sufficient. 
Thus  besides  the  flexibility,  scalability  (with  the  arbitrary 
choice  of  bresdcpoints,  domains  and  ranges)  and  simplicity 
tiiat  the  piecewise-linear  design  offers,  it  also  allows  faster 
enciphering/deciphering  rates  compared  with  the  selection 
of  more  complex  alternatives  (e.g.  DES  [7],  IDEA  [8]). 


and  are  amenable  for  parallel  systolic  array 
implementation.  These  chaotic  systems  exhibit  an  immense 
sensitivity  to  the  parameter  configuration.  This  makes  them 
ideal  for  application  to  secure  communication  systems  over 
reliable  computer  digital  networks.  In  addition  we  have 
presented  secure  transmission  schemes  (also  based  on 
chaotic  difference  systems)  that  are  capable  of  transmitting 
reliably  digital  information  over  channels  with  very  low 
(even  negative)  Signal  to  Noise  Ratio. 

Although  the  presented  cryptosystems  provide  an 
effective  method  for  data  encryption,  they  are  inefficient 
for  solving  the  problem  of  key  management.  In  contrast, 
public  key  cryptosystems,  such  as  RSA  [7],  support  an 
effective  scheme  for  key  management.  This  suggests  the 
use  of  a  hybrid  approach  exploiting  the  best  of  both 
cryptosystems  as  the  basis  for  the  practical  design  of  a 
secure  communication  system.  For  example,  the  RSA 
algorithm  may  be  used  for  authentication,  and  the  chaotic 
difference  systems  for  the  bulk  encryption  at  very  fast  rates. 
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5  Conclusions 

In  this  paper  we  have  presented  and  analysed  chaotic 
systems  of  difference  equations  which  display  chaotic 
evolution  over  a  wide  range  of  parameter  configurations 
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Abstract 

The  Propagator  method  (PM),  as  well  as  SWEDE  (sub¬ 
space  method  -without  dgend&composition)  [5]  and  BEWE 
(hearing  estimation  without  eigendecomposition)  [4],  [8], 
belong  to  a  class  of  subspace-based  methods  for  direction- 
of-arrival  (DOA)  estimation  which  do  not  require  the  eigen¬ 
decomposition  of  the  sampie  covariance  matrix  of  the  re¬ 
ceived  signals  and  which  only  use  linear  operations  on  the 
covariance  matrix  of  the  received  data.  These  methods  can 
therefore  be  implemented  with  a  reduced  complexity  com¬ 
pared  to  MUSIC.  In  [1],  [2],  a  method  was  proposed  for 
estimating  the  power  of  sensor  noise  and  the  DOA  using  the 
PM.  The  goal  of  the  present  paper  is  to  statistically  analyse 
these  noise  power  and  DOA  estimates. 


1  Introduction 

Most  of  the  subspace-based  methods  for  DOA  estima¬ 
tion  require  the  eigendecomposition  of  the  sample  covari¬ 
ance  matrix  or  the  singular  value  decomposition  of  the  data 
matrix  to  estimate  the  signal  and/or  noise  subspaces.  Unfor¬ 
tunately,  in  applications  like  high  resolution  passive  sonar 
systems  where  the  number  of  sensors  is  large,  the  use  of 
such  methods  is  unattractive  owing  to  their  intensive  com¬ 
putational  implementation.  The  PM  as  well  as  SWEDE  and 
BEWE,  belong  to  a  class  of  subspace-based  methods  for 
DOA  estimation  which  do  not  need  any  eigendecomposition 
and  which  only  use  linear  operations  on  the  covariance  ma¬ 
trix  of  the  sensor  outputs.  These  methods  then  have  a  clear 
potential  for  real-time  applications.  The  PM  uses  a  linear 
operator  referred  to  as  the  "Propagator"  which  only  depends 
on  the  steering  vectors  and  which  can  be  easily  extracted 
from  the  data  by  a  least  square  process.  A  non-asymptotical 
(i.e.  finite  amount  of  snapshots)  performance  analysis  of 
the  PM  has  been  reported  in  [3].  It  was  found  that  the  PM 
performs  like  MUSIC  at  high  and  moderate  SNR.  In  [I], 
[2],  a  joint  estimation  of  the  noise  power  and  the  Propagator 
from  the  data  was  proposed.  The  goal  of  the  present  paper 


is  to  analyse  the  asymptotical  (large  number  of  snapshots) 
performance  of  the  PM  for  estimating  the  noise  power  and 
the  DO  As.  In  section  2  the  PM  proposed  in  [1],  [2]  for  the 
joint  estimation  of  the  noise  power  and  the  DO  As  is  briefly 
recalled.  Section  3  is  devoted  to  the  asymptotical  statistical 
analysis  of  the  PM  and  expressions  for  the  variance  of  the 
noise  power  and  DOA  estimates  are  established.  Section  4 
provides  numerical  examples  exhibiting  a  comparison  of  the 
performance  of  the  PM  with  BEWE,  SWEDE  and  MUSIC. 
Section  5  concludes  the  paper. 


2  The  Propagator  Method 


Consider  an  array  of  M  sensors  on  which  K  incident  nar¬ 
rowband  point  sources  impinge  (M  >  K).  The  observation 
vector  X  of  the  sensor  outputs,  can  be  written: 


a;  =  As  -)-  n  (1) 

where  x  G  ^  ^  is  the  noisy  data  vector,  s  e  ^  * 
is  the  vector  of  the  signal  amplitudes,  n  G  6’^  ^  Ms  an 
additive  noise,  and  A  =  [a(^i), ...,  a(0^)]  G  ^  ^  is 
the  matrix  of  the  steering  vectors  a{6i)  G  ^  ^  and  0,-, 
i  =  1, ...,  A  is  the  direction  of  arrival  of  source  i,  mesured 
relative  to  the  normal  of  the  array.  Under  the  assumption  that 
the  noise  is  spatially  and  temporally  white,  the  covariance 
matrix  of  x  is  given  by 

R— E[xx^]  =  ASA^ (2) 

where  S  =  £'[.ss^]  is  the  signal  covariance  matrix  of 
dimension  K  x  K  assumed  to  be  nonsingular. 

The  definition  of  the  Propagator  relies  on  the  partition  of 
the  steering  matrix  according  to 

K  M-K 

A^=  [Af  Af] 


Under  the  assumption  that  Aj  is  nonsingular,  the  Propa¬ 
gator  is  the  unique  linear  operator  P  G  ^  (^~^),equiv- 

alently  defined  by: 


P^Ai  =  A2,  or  A^ 


P 

-I 


-  A^Q  =  0 


(4) 
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H  ^  j  1 

It  follows  that  matrix  Q  spans  the  nullspace  of  A  .  This 
matrix  can  be  estimated  from  the  sample  covariance  matrix: 

where  N  is  the  number  of  snapshots.  In  [1],  [2],  a  method 
was  proposed  for  estimating  both  the  noise  power  and  the 
Propagator  P  from  the  sample  covariance  matrix  (5).  This 
relies  on  the  following.  Consider  the  modified  covariance 
matrix 

R  — SIm  = 

where  5  is  positive  and  G{S)  and  H  {S)  are  matrices  of 
dimension  M  x  K  and  M  x  [M  -  K),  respectively.  The 
following  proposition  has  been  proved  in  [1]. 

Proposition:  Assuming  that  the  {M  —  K)  x  K  matrix 
A2  in  (3)  is  of  rank  K,  {P  =  P,5  =  a^)  is  the  unique 
solution  of 

G{5)  =  H{S)P  (7) 

if,  and  only  if,M  —  K>K. 


3  Statistical  Analysis 

According  to  the  central  limit  theorem,  it  can  be  checked 
that  R  -  H  =  O  which  means  that  VN{R-  R) 
is  bounded  in  probability  when  A  ^  oo.  It  then  follows 
from  (5)  and  (12)  that  —  H2  =  0  ^nd  that 

G2  ^2  ~  O  (^)-  With  a  first  order  expansion  of  H  2 

and  G2  we  easily  derive  [9]  that  the  estimator  (1 1)  provides  a 
consistent  estimate  of  <r^.  Similarly,  a  consistent  estimate 
P  of  P  can  be  obtained  from  R  -  according  to  (7) 
as 

P  =  G^H  (13) 

We  now  derive  large  sample  variance  expressions  for  the 
noise  power  estimate  (11)  and  the  DO  A  estimates  which 
minimize 

f{e)  =  a^{e)Ue,a{e)  (14) 

where  11$  =  QQ^  is  the  orthogonal  projector  onto  the 
noise  subspace. 


Consider  now  the  following  partition 


K  M  -  K 

Gi  Hi  ^  K 

G2  H^  2  JVf  —  K" 


According  to  (2),  matrices  G2  and  H2  satisfy: 

G2  =  A2SA^ 

H2  =  A2SA1  + 

It  can  easily  be  seen  that 

tr{n} 

where  11  =  Im-k  —  G2g\  =  Im-k  ~~  A2A2  and 
where  tr{.}  denotes  the  trace  operator  and  (.)t  istheMoore- 
Penrose  pseudo-inverse.  Then,  a  possible  estimate  of  can 
be  obtained  by  (see  also  [2]) 

^2  ^  fr{g2n} 
tr{n} 

where  H2  and  H  are  estimates  of  H 2  and  n,  respec¬ 
tively,  and  where 

1  ^  -  1  ^ 

H2  =  G2  =  —  ^a;2(<)*f(<) 


K  M  -  iC 

where 


Theorem  1 :  The  large  sample  (for  A  »  1)  variance 
ofo-^  (11),  is  given  by 


Proof:  See  the  Appendix. 

Now  from  a  first-order  approximation  of  the  first  deriva¬ 
tive  f'{9i)  of  f(0)  around  the  DOA  estimates  §i,  and  after 
a  first  order  expansion  of  11^  it  can  be  shown  that 

§.  _  0.  ~  _Re{dfnpaa 
(ifn^di 

,  rx  ,  1  - /f  ,  da(9i) 
where  Ilg  ~  Q{Q^Q) 

ai  =  a{0i)  and  where 

Q  =  [  ^  1  .  P  =  {G^G)-^G^\H  -  GP]  (17) 

In  expressions  (17),  Q  and  P  are  the  estimation  error 
matrices  of  Q  and  P,  respectively.  We  now  state  and  prove 
the  following  result. 

Theorem  2 :  Let  {  }  be  the  DO  As  estimated  by  the  PM. 
The  asymptotical  variance  (for  A  — )•  00)  of  9i  is  given  by 

Em-Gif]  =  ^  [(S-%  +  <t^(5-1(A^A)-^ A->),; 

(18) 
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where  (.)n  denotes  the  i-element  in  the  diagonal  of  the 
bracketed  matrix  and  7i  =  .  Note  that  this  expres¬ 

sion  is  identical  to  that  obtained  for  MUSIC  [6]. 

Proof :  See  the  Appendix. 

4  Simulations 

In  this  section  we  present  some  numerical  examples  to 
lend  support  to  the  theoretical  results  here  obtained.  In 
ail  the  examples,  we  consider  two  uncorrelated  sources  of 
equal  power  impinging  on  a  uniform  linear  array  of  equisp- 
aced  sensors. 

Example  I :  In  this  example  we  illustrate  the  performance  of 
the  proposed  method  for  estimating  the  noise  power.  Con¬ 
sider  9 1  =  0^  and  62  =  20"^.  First,  the  number  of  sensors  M 
varies  from  5  to  40  with  SNR=0  dB.  Next,  the  SNR  varies 
from  -2  to  12  dB  with  M  =  20.  In  all  the  cases  N  —  400 
snapshots  were  used  for  each  of  the  100  independent  tri¬ 
als.  Figure  1  displays  the  empirical  root  mean  square  errors 
(RMSE)  and  the  theoretical  standard  deviations  calculated 
using  expressions  (11)  and  (15),  respectively.  In  the  same 
figure,  the  performance  of  the  proposed  noise  power  esti¬ 
mator  is  compared  with  another  ’linear  method’  (LM)  and 
a  so-called  ’eigenvalue  method’  (EM)  proposed  and  statis¬ 
tically  analysed  in  [7].  It  can  be  seen  that  the  RMSE  of 
the  proposed  noise  power  estimator  and  that  of  the  EM  are 
comparable  for  large  values  of  M.  Note  that,  unlike  the 
proposed  method,  the  variance  of  the  LM  does  not  decrease 
as  M  increases. 

Example  2:  We  compare  in  this  example  the  performance  of 
the  PM  with  SWEDE  (G),  BEWE  and  MUSIC  for  estimat¬ 
ing  the  DOAs:  9i  =  5^  and  O2  ~  15^  from  50  snapshots. 
First,  the  number  of  sensors  M  varies  from  10  to  45  with 
SNR=0  dB  and  then,  the  SNR  varies  from  -4  to  10  dB  with 
M  —  20.  In  both  cases  the  empirical  RMSE  is  based  on 
200  independent  trials.  The  empirical  RMSE  and  the  theo¬ 
retical  St.  dev.  of  the  estimates  of  the  source  9i  =  5®  are 
exhibited  in  Figure  2.  This  figure  verifies  our  theoretical 
result,  i.e.,  for  a  spatially  and  temporally  white  noise  model, 
the  PM  variance  is  equal  to  that  of  MUSIC.  Consequently, 
this  version  of  the  PM  performs  better  than  SWEDE  (G)  and 
BEWE. 

5  Concluding  remarks 


noise  scenarios.  While  the  eigendecomposition  of  a  M  x  M 
matrix  requires  O(M^)  operations,  the  implementation  of 
the  PM  is  of  order  O(M^)  which  is  more  interesting  from 
the  computational  point  of  view. 


Appendix 

Proof  of  theorem  1 .  First-order  approximations  for  H2  ft 
and  ft  which  appear  in  (1 1)  and  the  use  of  11^2  =  0,  allow 
us  to  write 


^2 


(A.1) 


where 

a  =  tr{H2n  -  H2g\^G2II  -  jffjnGzGl}, 

/?  =  tr{Gj^Gfn  +  nG2G|},  and  r;  =  tr{n} 

(A.2) 

By  noting  that  H2TL  =  o-^n  and  G|n  =  HgI"  =  0 
and  by  substituting  (12)  into  (A.2)  it  can  easily  be  verified 
that  /?  =  0  and 


(t)Mx{t),  M  = 


t=l 


0  0 
0  n 


After  a  straightforward  derivation  we  obtain 


E[{a^  -  ay]  =  E 


a 


IT 


21 


(A.3) 


(A.4) 


Using  the  formula  for  the  expectation  of  four  random 
matrices  proposed  in  [10],  we  obtain: 


E[a'^]  =  tr{Mfi}2  + 


(A.5) 


It  follows  from  expressions  (A.3),  (8),  and  (9)  that 

Finally,  note  that  tr{n}  =  {M  -  K)  -  tr{A2A|}. 

Under  the  assumptions  that  A2a|  has  full  rank,  equal  to 
K,  and  that  M  —  K  >  K,  it  can  easily  be  shown  that 

A2A2  has  M  -  2A'  eigenvalues  which  are  equal  to  0  and 
that  the  K  remaining  eigenvalues  are  equal  to  1.  Hence 
tr{n}  =  M-2A'. 


The  purpose  of  this  work  was  to  statistically  analyse  the 
performance  of  the  PM  for  DOA  estimation  when  the  noise 
power  is  estimated  and  removed  from  the  sample  covariance 
matrix  of  the  sensor  outputs  according  to  the  method  pro¬ 
posed  in  [1],  [2].  It  has  been  shown  that  this  version  of  the 
PM  asymptotically  performs  like  MUSIC  in  spatially  white 


Proof  of  theorem  2.  Note  from  (16)  that 

g[(ReM)^] 


Em  -  Oif 


Ti 


(A.7) 


where  p  =  —cx^P 


H 


ai^i  with  ai^i  being  the  z-column 
of  A\  and  a,  =  Then,  it  follows  from  (5)  and  (6) 
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RMSE  RMSE 


Figure  1.  Empirical  RMSE  and  theoretical  st.  dev.  of  the 
noise  power  estimates  versus  (a)  Number  of  sensors,  (b) 
SNR.  The  proposed  method  (*)-{-  -),  the  LM  (x)-(-.),  and 
the  EM  (o)-(-),  (empirical  RMSE)-(theoretical  st.  dev.). 


Figure  2.  Empirical  RMSE  and  theoretical  st.  dev.  of  the 
estimate  of  6\  —  5"  versus  (a)  Number  of  sensors,  (b) 
SNR.  The  PM  (*)-(-),  SWEDE  (G)  (x)-(-.),  BEWE  (+)-(-),  and 
MUSIC  (o)-(-),  (empirical  RMSE)-(theoretical  st.  dev.). 


that  G  =  i  (0,  H  =  Eili  (<) 

and  p  becomes 

p  =  (A.8) 

where  matrix  M  =  G(G"  G) "  ^  ai  af  is  of  dimension 
M  X  {M  -  K).  By  inserting  (A.8)  in  (A.7),  using  once 
again  the  result  of  [10]  and  checking  that  QM  R  =  0, 
MQ^  R  =  0,  we  find 

E[{ei  -  Oif]  =  [tr{MQ" (A.9) 

It  can  easily  be  verified  that 
ti{MQ^ RQM^ R)  = 

(A.10) 

Hence  (18). 
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Abstract 

In  the  performance  evaluation  of  sources  localisation 
methods,  resolution  is  not  the  only  criterion. 
Degradations  may  occur,  due  to  parasite  peaks  in  the 
spectrum,  which  may  be  connected  to  high  sidelobes  in 
the  beam  pattern  or  to  ambiguities  themselves.  The  aim 
of  this  paper  is  to  study  the  presence  of  ambiguities  in  an 
array  of  given  planar  geometry.  The  ambiguity  problem 
for  an  arbitrary  array  is  examined  and  ambiguous 
situations  are  identified.  We  propose  a  general 
framework  for  the  analysis  and  so  we  obtain  a 
generalisation  of  results  given  in  recent  publications  [2], 
[3]  for  rank  one  and  two  ambiguities.  For  rank  A:  >  3 
ambiguities,  results  focus  on  linear  arrays,  for  which  we 
derive  original  and  synthetic  results.  Some  interesting 
results  are  driven  for  non  uniform  linear  arrays, 
including  sparse  linear  arrays  [4]. 

1.  Introduction 

In  the  performance  evaluation  of  sources  localisation 
methods,  resolution  is  not  the  only  criterion. 
Degradations  may  occur,  due  to  parasite  peaks  in  the 
spectrum,  which  may  be  connected  to  high  sidelobes  in 
the  beam  pattern  (sometime  referred  as  quasi¬ 
ambiguities)  or  to  ambiguities  themselves.  These 
ambiguities  arise  when  the  array  manifold  intersects  itself 
or  when  a  manifold  vector  can  be  written  as  a  linear 
combination  of  two  or  more  manifold  vectors  [1].  The 
aim  of  this  paper  is  to  study  the  presence  of  ambiguities 
in  array  geometry. 

The  ambiguity  problem  for  an  arbitrary  array  will  be 
examined  and  ambiguous  situations  will  be  identified.  We 
propose  a  general  framework  for  the  analysis  and  so  we 
obtain  a  generalisation  of  results  given  in  recent 
publications  [2],  [3]. 

In  section  2  notations  and  definitions  of  ambiguity  are 
introduced.  In  section  3  a  study  of  rank  one  ambiguous 


arrays  is  presented.  Section  4  depicts  the  main  results 
obtained  for  rank  two  ambiguous  arrays.  This  study  is 
made  for  arrays  of  arbitraiy  geometry.  In  section  5,  the 
study  is  restricted  to  linear  arrays  for  rank  three 
ambiguities.  In  section  6  we  derive  original  and  synthetic 
results  for  rank  k>3  ambiguities  restricted  to  linear 
arrays.  Some  interesting  conclusion  may  be  driven  for 
non  uniform  linear  arrays,  including  sparse  linear  arrays 
[4].  Section  7  includes  some  conclusions. 

2.  Problem  formulation  and  definitions 

Consider  an  array  with  M  sensors  receiving  N 
narrowband  signals  impinging  on  the  array  from  N 
different  locations  , . . . ,  .  Note 

=  the  matrix  with 

columns  the  sources  steering  vectors  called  also  the  array 
manifold  vectors. 

The  simultaneous  localisation  of  N  sources  is  only 
possible  if  the  array  manifold  vectors 
are  linearly  independent. 

An  array  is  said  rank  k  ambiguous  for  a  set  of  k+\ 
directions  of  arrival  6^  if  matrix  A  is  singular 

but  rank  k.  This  can  be  written: 

3^1  sothat  «ia(6^i)+...+ajt+i^(^Ar+i)  =  ^ 

(1) 

3.  Rank  one  ambiguities  (for  general  arrays) 

This  case  occurs  when  one  array  manifold  vector 
d(^l)  can  be  written  as  a  complex  scalar  multiple  of 
another  manifold  vector  0(^2)  where  62- 

3(0:1  7iO,Q;2  ?t0)  eC^,  sothatQ:,a(^)+a:2a(02)  =  ® 

(2) 
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In  such  case,  the  array  cannot  make  the  difference 
between  two  waves  with  bearings  6\  or  O2 . 

The  wavefronts  are  supposed  straight-line  and  on  the 
same  plane  as  the  sensors.  k\  and^2  being  the 
ambiguous  wave  vectors  for  the  array  under 
consideration,  the  phase  delay  of  signal  n  from  sensor  m 
to  sensor  one  is  : 


^mn 

th 

where  denotes  the  position  of  the  m  sensor  in  half 
wavelength.  Equation  (2)  is  then  equivalent  to  the 
condition : 

aiej”"'  +  =  0 =  (p„2 

foxm  = 

The  ambiguity  condition  can  be  written  : 

^Pm  >  integer  (ftj  -  *2  ^  (5) 

with  |^|  =  2;r/A  where  A  stands  for  the  wavelength.  It 
can  be  given  the  following  geometrical  interpretation  : 


Fig.  1  :  Stars  represent  some  possible  sensor  positions 
for  a  rank  one  ambiguous  array.  The  horizontal  axis  is 
defined  by  the  vectors  and^2- 

The  consequence  is  that,  for  arrays  of  arbitrary 
geometry,  rank  1  ambiguities  can  arise  if  all  of  its  sensors 
are  located  on  a  set  of  parallel  lines  separated  by  a 
distance  /  >  >1/2.  In  the  case  of  a  linear  array  this  result 
refunds  the  classical  Shannon  condition.  In  the  general 
case,  it  establishes  conditions  for  ambiguity  and  then  can 
give  the  ambiguous  directions  [5]. 

4.  Rank  two  ambiguities  (For  general  planar 
arrays) 

This  situation  occurs  when  the  array  manifold  line 
intersects  a  plane  in  more  than  two  points.  In  such  case. 


one  manifold  vector  can  be  written  as  a  linear 
combination  of  two  others  manifold  vectors,  which  may 
be  written: 


3(ai,a2.«3)  aia(0i)  + a2a(6»2)  + 0:35(03)  =  0 
(ai=l)  (6) 


with  a(0„)  = 


Sensor  1  is  taken  as  a  reference 


and  Pfnn  ~  -^m  • 

fi=0.  Therefore  for 


sensor  1,  ^u  =  n2  =  <P\3=^- 

The  ambiguity  condition  (6)  can  thus  be  written: 


1  +  a2  +  0:3  =  0 


(7) 


This  relation  can  be  interpreted  geometrically  in  the 
complex  plan  as  a  triangle  which  sides  are  the  vectors 
associated  with  the  complex  numbers  1, 0:2  >  “3  ■ 


Fig.  2  :  Interpretation  of  (7)  in  the  complex  plan. 

For  sensor  m  ambiguity  condition  (6)  becomes: 

QWm\  +Q;2eJ^'«3  =0 

In  the  complex  plan  the  product  by  e^^  is  a  rotation. 
Thus  the  sides  1,52,0:3  turn  respectively  from  angles 
Vm\  ’  Vm2  ’  Vm?,  "^ust  reconstitute  a  triangle 

according  to  relation  (8).  The  length  of  the  sides  of  the 
triangle  must  be  the  same,  therefore  the  triangles  are 
deducted  one  from  an  other  by  an  isometry.  This  isometry 
is  a  rotation  or  a  rotation  +  a  symmetry.  Thus  the 
triangles  corresponding  to  the  different  values  of  m 
belong  to  two  sub-families  :  the  rotation  family  and  the 
rotation  +  symmetry  family. 

The  following  results  can  then  be  derived  [5]  : 

1)  Any  rank  two  ambiguous  array  may  be 
splitten  in  two  subarrays  a^(0)  and  a^{d),  where  a^{0) 
and  0^(0)  are  rank  one  ambiguous,  for  three  directions 
01,02  and 03  i.e.:  a'(0i)=a'(02)=5'(03). 
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2)  As  a  consequence,  the  sensors  for  each 
subarray  are  located  at  the  nodes  of  a  two  dimensional 
lattice  for  arbitrary  ^  and  6^ .  The  figure  3  illustrate 
one  of  these  lattices. 


Fig.  3  :  Stars  represent  some  possible  sensor  positions 
for  ambiguous  subarrays  (for  directions  0^ ,  62  and  03). 


Fig.  4  :  Rank  three  ambiguity  for  a  linear  array. 

We  refund  here  the  notion  of  ambiguous  generator  set 
introduced  by  Proukakis  and  Manikas  [3],  Their 
ambiguous  generator  set  is  the  set  of  parallel  lines 
crossing  axis  defined  by  v  in  the  point  A  :  (1,0). 

It  appears  clearly  on  this  figure  that  the  condition  for 
no  rank  three  ambiguities  is  :  3(yt/a)  >  2 . 

Example  of  Proukakis  and  Manikas  [3] 


3)  The  second  lattice  corresponding  to  the 
second  subarray  is  related  by  an  arbitrary  translation. 
This  is  a  simpler  demonstration  and  a  generalisation  of 
the  theorem  1  given  by  Lo  and  Marple  in  reference  [2]. 


— * - * - * - ic - 

-2.3  -1.1  1.1  2.3 

Fig  5  :  Sensor  positions  on  the  array  are  in  half 
wavelength. 


5.  Rank  three  ambiguities  (For  linear  arrays) 

By  generalisation  of  the  previous  results,  we  infer  that 
the  sensors  of  a  linear  array  can  be  splitten  in  three 
subarrays.  In  each  subarray  sensors  are  on  a  grid  of 
spacing  denoted  a.  The  three  grids  are  translated  one 
from  each  other.  For  the  first  grid  : 

(9) 

where  v  is  the  unitary  vector  of  the  linear  array. 

-  2;r  ^ 

Let  us  denote  k  =  — u.  If  a  is  the  greater  common 
X 

divisor  of  the  inter  sensor  distances  in  a  subarray,  the 
ambiguity  condition  can  be  written  [5] : 

3a2..  integer,  so  that  v (i^  -  ii, )  =  -  (10) 

^  ^  a 

Thus,  all  the  sets  of  vectors  which  can  be 

projected  on  the  grid  of  step  Xja  are  ambiguous.  By 
arbitrary  translation  of  this  grid,  an  infinity  of  ambiguous 
direction  sets  can  be  obtained. 


In  their  example,  three  sources  are  located  in  :  0°, 
55.582®  and  82.505°.  The  considered  array  is  a  sparse 
linear  array. 

Two  parasite  peaks  appear  in  the  spectrum  of  MUSIC 
located  in  107.719°  and  137,657°,  Because  the  array  is 
ambiguous  the  MUSIC  algorithm  has  provided  five 
directions  rather  than  three. 


Fig,  6  :  MUSIC  spectrum  obtained  in  Proukakis 
example. 

This  phenomena  was  not  clearly  explained.  Application 
of  the  proposed  study  allows  us  to  predict  these 
ambiguous  directions  of  arrival.  By  application  of  results 
of  section  4,  it  is  easy  to  see  that  this  ambiguity  is  not  a 
rank  two  ambiguity.  Three  subarrays  can  be  find  which 
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permit  us  to  determine  the  ambiguous  directions  of 
arrival.  The  figure  7  depicts  the  splitting  into  subarrays 
which  explain  figure  6. 

— * - * - *— - * — 


-2.3 

-1.1 

1.1 

2.3 

_ 

a  =  4.6 

_ ^ 

_ _ - 

— X - 

+  -2.3 

1®^  Subarray 

2.3 

-f 

— *“ 

-1.1 

2”^  Subarray 

_ _ _ 

rd  ^ 

3  Subarray  j  j 

Fig.  7  :  Splitting  into  subarrays. 


In  this  example  we  take  a  =  4.6,  which  is  the  array 

A  1 

aperture  in  half  wavelength.  Therefore  —  =  — —  ■  The 

a  2.  j 

second  (respectively  the  third)  subarray  corresponds  to 
the  unique  sensor  number  two  (res.  number  three).  The 
ambiguous  directions  can  all  be  predicted  with  the 
following  construction  : 


Fig.  8  :  Graphic  determination  of  the  two  rank  three 
ambiguous  directions  for  the  linear  array  of  Proukakis 
example. 

There  are  two  rank  three  ambiguity.  The  predicted 
directions  of  arrivals  are  exactly  those  obtained  in  the 
MUSIC  spectrum. 

6.  Ambiguities  of  linear  arrays  (any  rank) 

By  generalisation  of  the  rank  two  case,  we  infer  the 
following  result  :  a  rank  k  ambiguous  general  array  can 
be  splitten  into  k  rank  one  ambiguous  subarrays,  for 
(k  + 1)  simultaneous  directions.  We  will  now  focus  on 
linear  arrays  for  which  we  can  obtain  a  general  result. 

A  linear  array  is  rank  k  ambiguous  if  it  can  be 
decomposed  in  k  subarrays  (which  may  be  reduced  to  one 
sensor)  with  spacing  a>kXl2.  The  corresponding  sets  of 
ambiguous  directions  ^  obtained  by 


the  following  geometrical  construction  (given  here  for 
k  =  4). 


Fig.  9  :  Determination  of  the  ambiguous  directions  of 
arrival  for  a  linear  array. 


The  projections  of  vectors  «,  on  the  antenna  direction 
must  be  on  a  grid  with  spacing  Xj  a .  Every  translated  grid 
provides  also  ambiguous  directions. 

7.  Conclusion 

We  propose  a  general  fi-amework  to  study  ambiguities 
for  general  arrays  and  give  some  properties  of  ambiguous 
array  geometry.  This  study  generalises  results  previously 
obtained  in  the  literature. 
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Abstract 

The  problem  of  estimating  or  tracking  the  time-varying 
principal  components  of  a  data  covariance  is  considered. 
We  assert  that  the  incorporation  of  some  notion  of  sub¬ 
space  motion  or  dynamics  will  make  possible  the  applica¬ 
tion  of  subspace-based  direction-finding  or  beamforming 
algorithms  in  scenarios  which  otherwise  would  be  consid¬ 
ered  data-starved.  An  ordinary  differential  equation  for 
simple  uniform  motion  in  the  space  of  projection  matrices 
is  developed.  This  dynamical  model  is  then  used  along 
with  the  artificial  assumption  of  subspace  sphericalization 
in  a  Gaussian  data  model,  from  which  the  cost  function  for 
maximum-likelihood  estimation  of  sub  space  motion 
parameters  is  derived.  Approaches  to  computing  these 
subspace  parameters  in  the  one-dimensional  case  are  pro¬ 
posed. 


1.  Introduction 

For  many  array  processing  problems  of  interest,  in 
both  military  applications  and  in  commercial  applications 
such  as  wireless  mobile  communications,  one  wishes  to 
perform  subspace-based  high-resolution  direction-finding 
and  beamforming  in  an  time- varying  environment.  Much 
has  been  reported  in  the  past  years  in  the  subspace  track¬ 
ing  problem,  that  of  adaptively  determining  the  principal 
components  or  subspace  of  a  data  covariance  which  is 
evolving  in  time.  The  state-of-the-art  up  to  1991  is 
described  in  an  excellent  survey  paper  by  Comon  and 
Golub  [1],  and  several  other  promising  algorithms  have 
been  proposed  since  then,  e.g.  [2-4]. 

In  all  of  the  algorithms  reported  in  the  literature, 
only  the  simplest  of  models  is  used  to  describe  the  motion 
or  rotation  of  the  subspace  under  consideration.  This 
model  is  in  some  sense  Ist-order  Markov,  by  which  we 
mean  that  one’s  best  estimate  of  the  subspace  several  time 
units  into  the  future  is  simply  the  current  value  of  the 


subspace.  In  these  algorithms,  one  does  not  take  into 
account  any  predictive  value  that  the  observed  motion  of 
the  estimated  subspace  may  have  over  time. 

This  lack  of  a  predictive  dynamic  model  for  the  sub¬ 
space  comes  primarily  from  the  fact  that  a  subspace  cannot 
easily  be  equated  with  a  vector  quantity  moving  about  in  a 
finite-dimensional  vector  space,  described  by  a  conven¬ 
tional  state-space  model.  If  this  were  the  case,  then  a 
straightforward  application  of  Kalman  filtering  would  be 
an  obvious  approach  to  the  problem.  What  we  seek  is  a 
model  which  naturally  describes  the  evolution  of  a  Af- 
dimensional  subspace  of  which  is  not  an  element  of  a 
Euclidean  space.  Given  such  a  model,  we  could  then  con¬ 
sider  how  to  use  the  predictive  capability  of  such  a  model 
in  a  subspace  tracker. 

Our  study  of  the  subspace  tracking  problem  involves 
two  parts:  a  dynamical  model  for  time- varying  subspaces, 
and  a  data  model  which  relates  the  observations  to  the 
desired  underlying  parameters.  These  topics  are  addressed 
in  Sections  2  and  3,  respectively.  The  general  estimation 
problem  posed  at  the  end  of  Section  3  remains  open.  A 
special  case  in  which  the  dimension  of  the  subspace  is  1  is 
of  interest  and  is  treated  in  Section  4.  Results  and  conclu¬ 
sions  follow. 


2.  Dynamical  Model  for  Subspaces 

Consider  to  be  the  space  of  A/-dimensional 
subspaces  of  C^.  X  e  ^  is  a  subspace  in  uniquely 
specified  by  P,  an  A  x  AT  projection  matrix  of  rank  M.  P 
has  the  two  properties  that  it  is  Hermitian  (P  =  P”)  and 
idempotent  (P^  =  P).  The  set  of  dWNxN  rank-M  projec¬ 
tion  matrices  form  a  connected  manifold  which  is  denoted 

We  seek  a  natural  dynamical  description  on  Xf^^y 
or  equivalently  on  the  set  of  coordinate  descriptions 
Subspace  X  can  be  thought  of  as  an  M-dimensional  com¬ 
plex  plane  extending  to  infinity  in  all  directions  and  con¬ 
taining  the  origin,  hence  its  motion  is  basically  rotational. 
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In  order  to  describe  its  motion,  we  borrow  ideas  from  the 
rigid  body  dynamics  of  objects  undergoing  rotations  with 
respect  to  the  origin  of  some  coordinate  system  (such  as  an 
airplane),  and  extend  them  to  higher  dimensions  and  com¬ 
plex  spaces. 

Consider  a  rigid  body  with  the  origin  of  its  coordi¬ 
nate  system  fixed,  rotating  freely  about  this  origin.  At  any 
time  t,  the  orientation  of  this  body  is  described  by  a  uni¬ 
tary  Q(0,  sometimes  called  a  rotation  matrix.  If  x  is  the 
coordinate  vector  for  some  point  on  the  body  in  its  nomi¬ 
nal  orientation,  then  y(t)  =  Q(t)\  gives  the  coordinate  vec¬ 
tor  for  this  same  point  in  the  rotated  body. 

A  natural  differential  equation  for  time- varying  rota¬ 
tions  is 

Q(r)  =  Q(f)A(t)  (2.1) 

where  A(r)  is  a  skew-symmetric  matrix.  To  see  that  this  is 
the  case,  note  that 

=  I  (2.2) 

and  hence 

Qit)Q\t)  +  Q(OQ^(0  =  0  •  (2.3) 

Defining 

A(r)  =  -Q^it)Q(t)  (2.4) 

(2.1)  follows  immediately,  and  furthermore  from  (2.3)  we 
have  that 

A'^(r)  =  -Ait)  .  (2.5) 

In  the  special  case  where  A  is  a  constant  skew- 
symmetric  matrix,  we  have  the  closed-form  solution  to 
(2.1)  as 

Q(0  =  Q(0)e'^'  .  (2.6) 

In  higher-dimensional  complex  spaces  C^,  an  anal¬ 
ogous  result  holds.  We  can  write 

Qit)  =  Q(f)A(t)  (2.7) 

where  A  is  skew-Hermitian  (A”  =  -  A)  and  given  by 

A(0  =  -Q”(t)Q(0  .  (2-8) 

Skew-Hermitian  matrices  have  the  property  that  the  eigen¬ 
values  are  purely  imaginary,  and  eigenvectors  correspond¬ 
ing  to  different  eigenvalues  are  orthogonal.  If  A  is  held 
constant,  then  the  closed-form  solution  to  (2.7)  is 

Qit)  =  QiO)e^'  .  (2.9) 

This  dynamical  model  for  time-varying  rotations 
Q(t)  is  not  our  objective  here,  although  it  will  be  relevant. 


Rather,  we  seek  a  description  of  the  motion  of  a  point  in 
^N,M<  which  perhaps  can  be  visualized  as  a  "rigid  body" 
consisting  of  an  infinite  sheet  fixed  at  the  origin  and  rotat¬ 
ing  under  the  influence  of  various  torques  and  forces.  We 
choose  the  projection  matrix  P  as  a  unique  coordinate 
description  for  a  point  in  ^nd  thus  consider  the  flow 
on  P  n,m- 

The  use  of  dynamical  models  for  P(r)  was  intro¬ 
duced  by  Dowling  and  DeGroat  in  [5].  There  a  differen¬ 
tial  equation  for  P(r)  was  proposed,  not  for  the  purpose  of 
developing  a  subspace  tracker,  but  rather  for  establishing 
the  convergence  properties  of  a  previously  developed  algo¬ 
rithm  [2].  The  proposed  ODE  was  a  Riccati  equation  of 
the  form 

Pit)  =  P(0R  +  RP(0-2P(0RP(0  (2.10) 

where  R  is  a  positive  definite  Hermitian  matrix.  The 
global  attractor  for  this  ODE  is  the  projection  operator  for 
the  true  signal  subspace  of  R. 

We  propose  here  a  simpler  ODE  for  P(r)  which 
describes  something  akin  to  "uniform  motion"  on  Pm,n-  R 
is 

P(r)  =  P(0A(0  -  A(0P(0  (2.11) 

for  skew-Hermitian  A(f).  To  see  that  (2.11)  generates  a 
flow  on  Pm,n<  note  that  the  two  defining  properties  of  pro¬ 
jection  matrices  require  that 

P“(r)  =  Pit)  (2.12) 

and 

P(r)P(/)  -I-  P(r)P(f)  =  Pit)  (2.13) 

both  of  which  are  satisfied  by  (2.11)  provided  that  A  is 
skew-Hermitian.  Furthermore,  since  P(/)  is  continuous  for 
bounded  A,  it  cannot  change  rank  at  any  time  t. 

If  A  is  held  constant,  the  solution  to  (2.1 1)  is 

Pit)  =  Q"(f)P(0)Q(r)  (2.14) 

where 

Qit)  =  e^'  (2.15) 

by  the  previous  arguments  of  this  section.  We  take  (2.11) 
to  be  the  simplest  non-trivial  model  for  uniform  motion  in 
the  space  Pn,M'  consider  the  flow  to  be  somewhat 
analogous  to  constant  velocity  motion  along  a  straight  line 
for  points  in  a  Euclidean  space.  With  this  analogy  in 
mind,  we  now  consider  the  problem  of  estimating  the 
parameters  of  this  uniform  motion,  given  observations 
from  a  particular  stochastic  model. 
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3.  Data  Model 

The  usual  model  often  considered  in  the  application 
of  subspace-based  algorithms  such  as  MUSIC  or  ESPRIT, 
is  one  in  which  the  data  vector  x  e  is  complex  Gaus¬ 
sian  with  mean  0  and  covariance  R.  The  shorthand  nota¬ 
tion  for  this  is 

X  -  CN(0,R)  .  (3.1) 

The  eigenvalues  of  R  can  be  split  into  two  classes,  with 
the  eigenvectors  of  R  corresponding  to  the  larger  eigenval¬ 
ues  spanning  the  signal  subspace,  and  the  eigenvectors 
corresponding  to  the  smaller  eigenvalues  spanning  the 
noise  subspace.  The  maximum-likelihood  estimates  of 
these  two  subspaces  are  found  by  principal  component 
analysis:  form  a  sample  covariance  matrix  from  the  data 
vectors,  and  compute  an  eigendecomposition  to  determine 
the  signal  and  noise  subspaces. 

It  is  interesting  to  note  that  the  eigenvalues  of  R  are 
nuisance  parameters  if  all  that  is  desired  are  the  two  sub¬ 
spaces.  Furthermore,  the  ML  estimates  of  the  subspaces 
do  not  change  if  one  introduces  an  artificial  constraint  that 
all  of  the  signal  subspace  eigenvalues  are  equal  (a^)  and 
that  all  of  the  noise  subspace  eigenvalues  are  equal  (c^), 
with  >c^.  DeGroat  and  Dowling  [2,5]  refer  to  this 
assumption  as  subspace  sphericalization,  and  several  com¬ 
putational  advantages  follow  from  it.  Under  this  model 
the  covariance  R  is  given  by 

R  =  +  a^P„  (3.2) 

where  and  P„  are  projection  matrices  for  the  signal  and 
noise  subspaces,  respectively,  and 

=  I  -  P.  .  (3.3) 

For  time-varying  subspaces,  we  propose  the  follow¬ 
ing  data  model  which  incorporates  the  simple  dynamic 
model  of  Section  2  and  subspace  sphericalization.  Let 

x(^)  -  CN(0,R(A:))  it  =  1  ...  AT  (3.4) 

where 

R(A:)  =  c^P.CAtT’)  +  c^P„(^7’)  .  (3.5) 

P^(r)  is  evolving  in  time  according  to  the  differential  equa¬ 
tion 

P(0  =  P(OA  -  AP(0  (3.6) 

with  A  held  constant.  From  the  results  of  the  previous 
section  we  have 

P,(kT)  =  .  (3.7) 

The  problem  of  tracking  Pj(r)  reduces,  under  this  model, 
to  that  of  estimating  Pq  and  A  (or  Q  =  e^^). 


The  probability  density  function  for  the  observed 
data  is 

A  =  (3.8) 

where 

R-'(*)  =  a72p,(A:r)  +  cfV„(kT)  .  (3.9) 

Straightforward  manipulation  of  (3.8-3.9)  reveals  that  the 
ML  estimates  of  the  parameters  Pq  and  A  are  not  functions 
of  and  a^,  and  that  they  are  found  via  maximization  of 
the  cost  function 

K 

J  =  X  •  (3.10) 

it=l 

The  interpretation  of  (3.10)  is  intuitive:  we  seek  a  set  of 
rotations  e^^^ ,  which,  when  applied  in  sequence  to  the 
data  vectors  x(Jc),  would  move  them  as  closely  as  possible 
back  to  a  single  Af -dimensional  subspace  described  by  Pq. 

At  the  time  of  this  writing  we  do  not  have  a  general 
closed  form  solution  for  the  maximization  of  J,  and  con¬ 
sider  it  to  be  an  open  research  problem. 

4.  The  One-Dimensional  Case 

The  special  case  M  =  1  deserves  special  attention. 
When  Af  =  1,  each  data  vector  x(/:)  can  support  a  crude 
estimate  of  ¥(k),  independent  of  the  other  data,  and  these 
estimates  can  be  used  in  various  ad  hoc  ways  to  approxi¬ 
mate  the  matrix  Q  =  e^^  which  relates  them  all. 

For  the  sake  of  argument,  let  us  suppose  that  each 
data  vector  x(k)  lies  in  the  range  of  the  one-dimensional 
projection  P(k),  Define  the  normalized  observation 


u(k)  =  x(k)/\x(k)\  (4,1) 

and  then 

P(k)  =  u(*)u“(A:)  .  (4.2) 

From  (2.14)  we  have  the  recurrence  relation 

P(A:)  =  Q«P(;:-1)Q  .  (4.3) 

Note  that  (4.3)  is  not  equivalent  to  the  statement 
u(fc)  =  Q”u(A:  -  1),  since  we  only  know  that  u(Jfc)  lies  in 
the  range  of  P(k).  However,  we  can  say  that 

u{k)dik)  =  Q“u(A:-1)  (4.4) 

where  d(k)  is  an  unknown  complex  scalar  of  the  form 

d(k)  =  .  (4.5) 
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One  promising  approach  to  the  estimation  of  the  Q, 
which  can  be  interpreted  as  the  small  rotation  applied  to 
the  subspace  of  interest  at  each  time  step,  is  to  find  param¬ 
eters  d(k)  and  Q  which  make  (4.4)  approximately  true  for 


k  =  2--- 

K.  Define 

A  =  [u(2)  1  •  •  • 

1  u(^:)] 

(4.6) 

and 

B  =  Iu(l)l--- 

lu(^:-l))  . 

(4.7) 

Then  in  matrix  form  (4.4)  becomes 

AD  =  Q“B 

(4.8) 

where 

D  =  diag(d{l) 

■■•diK))  . 

(4.9) 

Given  sufficient  data,  (4.8)  will  not  be  exactly  true, 
but  we  can  make  it  approximately  true  by  defining  an 
appropriate  cost  function  and  minimizing  it  with  respect  to 
the  unknown  parameters  D  and  Q.  The  squared-error  cost 
function  is 

H  =  rr(AD-Q“B)(AD-Q”B)“  .  (4.10) 

There  are  closed-form  solutions  for  the  two  problems  of 
minimizing  H  w.r.t  Q  holding  D  fixed,  and  vice  versa. 

Fix  D  in  (4.10),  and  let 

A  =  AD  .  (4.11) 

The  cost  function  becomes 

H  =  rr(B-QA)(B-QA)”  .  (4.12) 

Expanding,  we  have 

H  =  rr(BB“  -  QAB”  -  Ba”q“  +  Aa”)  (4.13) 

and  thus  H  is  minimized  when  we  maximize 

H'  =  Re{rrQAB“}  .  (4.14) 

Let  the  singular  value  decomposition  (SVD)  of  AB**  be 
given  by 

AB»  =  USV»  .  (4.15) 

Then 

H'  =  Re{rrQUZV"}  .  (4.16) 

H'  is  upper-bounded  by  trL  and  this  bound  can  be  met 
with  equality  when 

Q  =  VU“  .  (4.17) 

This  gives  a  constructive  method  for  finding  the  optimal  Q 
when  D  is  fixed. 


Likewise,  when  Q  is  fixed,  let 

B  =  Q»B  (4.18) 

and  solve  for  D  using  least-squares.  This  leads  to  K 
decoupled  problems  because  of  the  diagonal  constraint  on 
D.  Define  and  b*  to  be  the  A:th  column  of  A  and  B, 
respectively.  Then  the  least-squares  solution  for  djt  is 
given  by  df^  =  where 

0*  =  argia^hi)  .  (4.19) 


5.  Conclusion 

We  have  proposed  the  use  of  a  dynamic  predictive 
model  for  subspaces  for  application  in  the  subspace  track¬ 
ing  problem.  A  model  for  simple  uniform  motion  in  the 
space  of  projection  matrices  has  been  developed.  This 
model,  combined  with  a  complex  Gaussian  data  model, 
leads  to  a  maximum-likelihood  estimation  problem  where 
the  parameters  of  interest  are  the  subspace  motion  parame¬ 
ters. 

Successful  incorporation  of  subspace  dynamics  into 
the  tracking  problem  will  extend  the  range  of  useful  appli¬ 
cations  of  subspace  methods  in  high-resolution  direction¬ 
finding  and  beamforming.  The  work  described  in  this 
paper  represents  a  preliminary  investigation  of  this  poten¬ 
tially  important  problem. 
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Abstract 

We  consider  the  problem  of  estimating  the  parameters  of 
an  unknown  multi-input  multi-output  linear  system,  and  the 
related  problem  of  deconvolving  and  recovering  its  inputs, 
using  only  observations  of  the  system  outputs.  We  derive 
simple  closed-form  asymptotic  expressions  for  the  Cramer- 
Rao  lower  bound  (CRLB)  for  the  system  parameters,  as 
well  as  lower  bounds  on  the  signal  reconstruction  perfor¬ 
mance.  These  show  that  the  identification/deconvolution 
performance  depend  on  the  accuracy  with  which  the  scale 
and  the  location  parameters  of  the  input  probability  den¬ 
sity  functions  can  be  identified  from  observation  of  the  in¬ 
put  signals.  It  is  also  shown  that  the  CRLB  possesses  a 
block  diagonal  structure,  indicating  that  the  general  multi¬ 
channel  deconvolution  problem  is  decoupled  into  two  inde¬ 
pendent  simpler  sub-problems:  The  signal  separation  prob¬ 
lem  where  the  unknown  system  is  deconvolved  to  a  diagonal 
one,  and  the  remaining  independent  single-channel  decon¬ 
volution  problems  associated  with  the  equalization  of  each 
of  its  diagonal  elements. 


1  Introduction 


In  many  applications,  observations  are  made  on  the  out¬ 
puts  of  an  unknown  multi-input  multi-output  (MMO)  lin¬ 
ear  system,  from  which  it  is  necessary  to  identify  the  system 
and  recover  its  inputs.  A  classical  example  is  the  problem 
of  separating  several  speakers  using  multiple  microphone 
measurements.  The  unknown  system  in  this  case  represents 
the  acoustic  media  which  couples  the  speakers  to  the  mi¬ 
crophones,  including  all  of  its  multipath  and  reverberation 
effects.  Another  example,  receiving  growing  attention  re¬ 
cently,  is  the  problem  associated  with  the  recovery  of  data 

•This  work  was  supported  by  the  Office  of  Naval  Research  under  con¬ 
tract  no.  N00014-95-1-0912,  and  by  the  University  of  California  MICRO 
program  and  Applied  Signal  Technology,  Inc. 


communication  signals  that  share  the  same  frequency  band. 
Multiple  receivers  are  typically  used  to  decouple  and  recon¬ 
struct  the  original  transmitted  information  from  its  super¬ 
imposed  and  distorted  observations.  Here,  the  system  to 
be  identified  and  deconvolved  is  the  MIMO  communication 
channel  which  links  the  information  sources  to  the  receivers. 
Similar  problems  can  be  found  in  diverse  fields  of  engi¬ 
neering  and  applied  science  including  radar/sonar  array  pro¬ 
cessing,  seismic  exploration,  radio  astronomy,  economet¬ 
rics,  and  more. 

2  Problem  Formulation 

We  consider  the  two  closely  related  problems  of  multi¬ 
channel  system  identification  and  deconvolution,  in  which 
we  observe  the  outputs  2/1  (f ) , . . . ,  j/jv  (0  of  an  unknown  N  x 
N  stable  linear  time  invariant  (LTI)  system  H,  whose  (un¬ 
observed)  inputs  are  si(f), . . . ,  and  whose  frequency 
response  is: 


7f(w)  = 


Hn{u>) 
Hni  (w) 


/fjVAr(w) 


(1) 


Thus,  the  entry  Hij{u)  of  7f(w)  denotes  the  frequency  re¬ 
sponse  of  the  SISO  system  that  couples  input  sj  (t)  to  output 
2/.(0- 

Based  on  observation  of  yi{t), ...,  ypf{t)  we  want  to 
identify  TL,  and/or  deconvolve  and  recover  its  inputs  us¬ 
ing  an  A  x  A  reconstruction  system  Q,  whose  inputs  are 
2/1  (/),-•• ,  2/7v(/)  and  whose  outputs  are  si  (/),... ,  SAr(/)- 

Let  A  denote  the  combined  system  relating  the  re¬ 
constructed  signals  si{t),. . .  ,SN{i)  to  the  input  signals 
si(/), .  ..,SN{t).  Then, 


A{w)  =  g{u)'H{u)  (2) 

where  A{u))  and  g{w)  are  the  frequency  responses  of  the 
systems  A  and  G,  respectively,  and  H{u)  is  given  by  (1). 
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Assuming  that  W  is  an  invertible  system,  we  want  to  set 

g(u>)  =  Vw  (3) 

which  according  to  (2)  implies  that  is  a  unity  transforma¬ 
tion,  in  which  case  si(t)  =  s,  (f)  and  the  inputs  are  exactly 
recovered. 

The  following  two  assumptions  will  be  used  throughout; 

Assumption  I  The  input  signals  are  sample  functions  (re¬ 
alizations)  from  mutually  independent  stochastic  pro¬ 
cesses. 

Assumption  11  Each  of  the  input  signals  constitute  a  se¬ 
quence  of  independent  and  identically  distributed 
(i.i.d.)  random  variables. 

Assumptions  I  and  II  are  sufficient  (although  in  some  spe¬ 
cial  cases  not  necessary)  for  the  identification  of  7i  and  the 
recovery  of  its  inputs.  In  the  following  sections,  we  study 
how  accurately  these  two  tasks  can  be  performed. 

3  The  Cramer-Rao  Lower  Bound 


where  AismNTxNT  block  matrix  whose  i ,  j  block  is  the 
T  xT  Toeplitz  matrix  containing  the  unit  sample  response 
coefficients  of  Aij,  vi(f), ....  mit)  are  the  outputs  of  the 
N  X  N  system  whose  frequency  response  is  .4(w)“^  and 
whose  inputs  are  Si  (f), . . . ,  sj\r(f),  saAVsi{x)  stan^forthe 
pdf  of  Si  (t)  which  is  assumed  to  be  strictly  positive  and  dif¬ 
ferentiable,  for  all  X. 

Time  Domain  Parameterization 

Suppose  that  the  unknown  parameters  are  the  unit  sam¬ 
ple  response  coefficients  aij(r),  for  i,  j  G  {1,  •  ■  • .  (V)  and 
T  e  {0,  ±1, ... ,  ±K}.  The  non-negative  integer  K  is  as¬ 
sumed  to  be  much  smaller  than  T,  so  that  the  total  number 
of  unknowns  is  small  compared  to  the  data  block  size. 

Define  the  log-likelihoodgradient%  (r)  as  the  derivative 
of  (4)  with  respect  to  atj  (r)  at  A  =  A,  where  a.j  (r)  is  the 
unit  sample  response  of  Aij .  Then, 

4(r)  =  -T{6iT)  +  R,,„{T))  (5) 

4(r)  =  -TR,„,ir)  i^j  (6) 

where  6(  • )  denotes  the  Kronecker  delta  function,  and 


Derivation  of  the  exact  Cramer-Rao  lower  bound 
(CRLB)  for  the  problem  at  hand  is  generally  intractable.  We 
will  develop  therefore  the  asymptotic  bound  only,  in  which 
case  the  ’’end  effects”  can  be  ignored.  A  similar  approach 
was  used  in  e.g.  [2]  [5]  [8]  [10]  [11]  for  various  SISO  iden¬ 
tification/deconvolution  problems.  Furthermore,  in  order  to 
simplify  the  analysis  and  to  gain  several  important  insights, 
we  first  compute  the  CRLB  with  respect  to  the  components 
of  the  combined  system  A,  as  if  they  are  free  adjustable  pa¬ 
rameters.  We  then  use  the  relation  in  (2)  to  derive  the  bound 
with  respect  to  the  components  of  W .  As  we  shall  see  in  Sec¬ 
tion  4,  the  accuracy  in  which  the  system  A  can  be  identified 
is  by  itself  an  important  issue,  as  it  governs  the  deconvolu¬ 
tion  performance. 

3.1  The  CRLB  for  A 

Let  quantities  with  over-bar  denote  the  true  parameter 
values.  Thus,  H  stands  for  the  system  that  actually  geii- 
erated  yi(f),  •  ■ . ,  yN{t)-  We  set  g  to  be  the  inverse  of  U. 
Therefore,  we  shall  compute  the  CRLB  for  the  system  A 
whose  true  value,  .4,  is  the  W  x  A  unity  system. 

Suppose  that  T  observations  of  the  signals 
si(t), . . . ,  SN{t)  are  available.  Then,  the  log-likelihood  of 
the  observed  data  is  asymptotically  given  by: 

r 

-  log[ldet{A}|]  •+  y^log[^5,(^i(^))  ] 

(=1 

+  + J2^og[VsAMi))]  (4) 

t  =  l 


Zi{t) 

RziSjiT) 


(7) 

(8) 


Thus,  the  non-zero  elements  of  the  Fisher  Information 
matrix  (FIM)  are  given  by; 


E{£ii{n)iii{T2)} 


E{eij{n)eij{T2)} 

E{eij{Ti)£ji{T2)} 


=  T  [cum{zi,2i,s<,Sj}  5(ri)  5(r2) 
-f  Ci  6{ti  -  T2)  -f  i5(ri  +  T2)  ] 

(9) 

=  T^^Ci6in-r2)  (10) 

Var{s,} 

=  T  6{t\  -f  T2)  (11) 


where  cum{  }  denotes  the  joint  cumulant  of  the  random 
variables  in  the  brackets,  and 


Ci  =  Var{zi}Var{si}  (12) 


which  is  known  as  Fisher’s  information  for  the  location  pa¬ 
rameter  (FTL). 

All  other  elements  of  the  FTM  are  zero,  indicating  lack  of 
statistical  correlation  between  the  estimates  of  the  diagonal 
and  the  off-diagonal  elements  of  A.  Furthermore,  the  esti¬ 
mates  of  aii(T)  are  uncorrelated  with  the  estimates  of  dj j  (r) 
for  i  ^  j.  Therefore,  at  least  asymptotically,  the  general 
N  X  N  identification/deconvolution  problem  can  be  decou¬ 
pled  into  two  independent  simpler  sub-problems:  The  sig¬ 
nal  separation  problem  associated  with  the  identification  of 
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the  aij{r)s  with  i  ^  j,  and  the  remaining  N  independent 
single-channel  problems  in  which  the  a,-,  (r)s  are  specified. 
Furthermore,  \he,N  x  N  signal  separation  problem  further 
decouples  into  (^)  independent  pairwise  separation  prob¬ 
lems. 

The  CRLB  on  the  error  variance  of  any  unbiased  estimate 
aij(r)  of  a-ij  (r)  is  given  by  the  inverse  of  the  FM, 


Var{a,_,(r)}  > 


1  Var{s.}  Cj 
T  Var{sj}  -  1 


(13) 


where  for  i  =  j  and  r  =  0,  the  term  -A'  ,  should  be  re- 

4.,  —1 

placed  by  with  Si  being  the  Fisher  information  for  the 
scale  parameter  (FIS), 


Si  =  cum{2i,Zi,s,-,s,}  +  £,•  -I-  1  (14) 


The  simple  structure  of  the  bound  in  (13),  indicates  that 
essentially,  it  is  the  FIL’s  (or  FIS’s)  that  govern  the  estima¬ 
tion  accuracy  of  a,  j  (r).  Thus,  the  identification  of  the  com¬ 
bined  system  A,  is  strongly  related  to  the  basic  problem  of 
estimating  the  scale  and  location  parameters  of  the  input  dis¬ 
tributions  based  on  direct  observation  of  the  input  signals.  In 
fact,  if  Var{s,(f)}  =  1  and  £,•  »  1  i  e  {1, . . . ,  A'^},  then 
for  all  combinations  of  i,  j  and  r  except  i  =  j,T  =  0,  the 
RHS  of  (13)  coincides  with  the  CRLB  for  the  estimation  of 
the  location  parameter  of  Tsi  (x)  given  T  independent  real¬ 
izations.  Similarly,  for  i  =  j  and  r  =  0,  the  RHS  of  (13) 
coincides  with  the  CRLB  for  the  estimation  of  the  scale  pa¬ 
rameter  of  Ps;  (i). 

Due  to  the  block  diagonal  nature  of  the  FIM,  the  RHS  of 
(13)  with  i  =  j  is  the  CRLB  for  estimating  dalr)  given 
that  all  the  other  components  of  the  system  A  are  known  a- 
priori.  Therefore,  it  coincides  with  the  result  of  [2],  where 
the  CRLB  for  the  identification  of  a  SISO  system  from  ob¬ 
servation  of  its  output  was  derived. 

Similarly,  the  RHS  of  (13)  with  i  ^  j  coincides  with  the 
CRLB  for  estimating  a,j  (r)  given  that  all  the  other  compo¬ 
nents  of  A  except  to  dj,  (r)  are  known  a-priori.  Thus,  lack 
of  precise  knowledge  of  the  a,-,  (r)’s  do  not  affect  the  esti¬ 
mation  of  the  a,j(r)’s  with  i  ^  j,  and  vice-versa. 

Frequency  Domain  Parameterization 

Next  consider  the  frequency  domain  formulation  in 
which  the  unknown  parameters  are  set  to  be  Aij  (w)  = 

Y^r  at  the  DFT  frequencies 

u;e{2TrBk  ;  =  0, 1, . . .,  ^  -  1}, 

where  we  assume  that  ^  is  an  integer  and  that  1  >  B  > 
Y .  As  before,  the  overdl  number  of  unknown  parameters  is 
small  compared  to  the  data  block  size. 


By  analogy  to  the  time  domain  formulation,  one  can 
show  that  the  CRLB  is  given  by: 


Var{4(w)}  > 


1  Var{s,}  Cj 
BT  Var{s/}  CiCj  -  1 


(15) 


where  for  i  =  j  and  w  =  0,  ±n,  the  term  should  be 
replaced  by 

Observe  that  (15)  is  identical,  up  to  the  factor  of  to 
the  CRLB  for  the  time  domain  parameterization  given  in 
(13).  Therefore,  the  time  and  frequency  domain  parameteri- 
zations  are  essentially  equivalent.  However,  due  to  the  win¬ 
dowing  operation  used  in  the  frequency  domain  par^eter- 
ization,  the  data  block  is  effectively  smaller  in  that  case  (by 
a  factor  of  ^). 

Note  also  that  if  si(f), . . . ,  sjv(f)  are  jointly  Gaussian, 
then  Ci  =  1  and  the  RHS  of  (15)  is  infinite  indicating 
that  the  problem  can  not  be  solved  in  this  case.  The  rea¬ 
son  being  that  in  the  Gaussian  case  all  the  available  infor¬ 
mation  about  the  unknown  system  is  contained  in  the  first 
and  second-order  statistics  of  the  observed  signals.  These 
statistics  are  "blind”  to  unitary  transformations  on  the  data 
signals.  Therefore,  the  system  may  only  be  identified  up  to 
an  arbitrary  unitary  transformation. 

Furthermore,  consider  for  simplicity  the  case  AT  =  2  and 
suppose  that  one  of  the  inputs,  say  si(f),  is  non-Gaussian. 
Then,  the  performance  is  the  worst  when  the  other  signal 
S2{t)  is  Gaussian.  The  variances  of  Auiw)  and  A2i{u) 
are  the  highest  in  this  case,  and  the  variance  of  A22{w) 
is  infinite  in  accordance  with  the  well  known  fact  that  a 
SISO  system  driven  by  a  Gaussian  process  can  not  be  iden¬ 
tified/deconvolved.  To  verify  this  note  that  £2  >  1  with 
equality  if  and  only  if  S2{t)  is  Gaussian  (see  e.g.  [2]  [10] 
[11]),  and  the  RHS  of  (15)  is  always  a  monotone  decreasing 
function  of  £2. 


3.2  The  for 


Recall  that  we  were  originally  interested  in  the  CRLB 
with  respect  to  the  components  of  H.  Since  W  and  .4  are 
related  through  the  linear  transformation  G,  we  can  trans¬ 
late  the  results  above  to  a  bound  on  the  components  of  it. 
With  the  frequency  domain  parameterization  we  obtain  sim¬ 
ple  closed  form  expressions.  Using  the  relation  in  (2),  which 
holds  ^proximately  for  the JDFT  frequencies,  the  asymp¬ 
totic  CRLB  on  the  estimate  Hij{u)  of  Hij(w)  is: 


Var{4(a.)}  > 

ik  =  l 


Var{si;} 

Var{sj} 


Cj 

CkCj  - 1 
(16) 


where  for  =  j  and  w  =  0,  ±7r  the  term  7^  should  be 
replaced  by 
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4  Signal  Reconstruction 


In  the  previous  section  we  addressed  the  system  identifl- 
caUon  problem,  and  examined  how  accurately  the  systems 
^  and  W  can  be  identified.  Next  we  consider  the  signal  de- 
convolution  issue,  and  determine  how  precisely  the  inputs 
can  be  recovered. 

A  common  measure  of  signal  reconstruction  is  the 
interference-to-signal  power  ratio  at  each  of  the  reconstruc¬ 
tion  filter  outputs.  Invoking  (15)  and  the  model  assump¬ 
tions,  it  is  not  difficult  to  verify  that  the  interference-to- 
signal  at  the  ith  output  terminal  is  bounded  by: 

Note  that  the  expression  in  (17)  does  not  depend  on  the 
pre-processing  interference-to-signal  ratios  nor  on  the  un¬ 
known  system  H .  It  only  depends  on  the  basic  amount  of  in¬ 
formation  contained  in  the  input  signals  with  respect  to  their 
location  parameters. 

Another  useful  measure  of  signal  reconstruction  is  the 
mean  square  restoration  error  (MSE),  defined  as: 

MSEi  =  E{[si{t)  -  Si(t)]^}  (18) 


Once  again,  invoking  the  results  of  the  previous  section,  it 
can  be  shown  that  M SEi  is  bounded  by: 


MSEi  > 


N  Var{s,(t)} 
BT  Ci 


(19) 


Finally,  we  note  that  the  bounds  in  (17)  and  (19)  hold 
also  for  the  case  where  the  system  H  has  more  outputs  than 
inputs.  In  such  a  case,  one  may  use  several  different  sets 
of  N  outputs  of  the  system  U  to  generate  different  sets  of 
reconstructed  signals.  Then,  average  over  the  different  re¬ 
constructed  signal  sets,  in  an  attempt  to  improve  the  perfor¬ 
mance.  However,  the  set  of  averaged  reconstructed  signals 
is  related  to  the  input  signal  through  some  equivalent  AT  x  A 
system,  that  has  the  same  true  value  A.  Thus,  the  recon¬ 
struction  performance  remain  intact. 

Of  course,  if  additive  noises  are  present,  then  the  above 
procedure  will  indeed  improve  performance  as  the  noise 
contributions  will  be  averaged  out.  However,  for  low  level 
of  noise,  this  improvement  is  expected  to  be  small.  We 
therefore  conclude  that  for  sufficiently  high  SNR,  there  is 
not  much  point  in  trying  to  increase  the  number  of  available 
data  sensors  beyond  the  minimum  required. 
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